Toward Enhanced State of Charge Estimation of Lithium-ion Batteries Using Optimized Machine Learning Techniques

State of charge (SOC) is a crucial index used in the assessment of electric vehicle (EV) battery storage systems. Thus, SOC estimation of lithium-ion batteries has been widely investigated because of their fast charging, long-life cycle, and high energy density characteristics. However, precise SOC assessment of lithium-ion batteries remains challenging because of their varying characteristics under different working environments. Machine learning techniques have been widely used to design an advanced SOC estimation method without the information of battery chemical reactions, battery models, internal properties, and additional filters. Here, the capacity of optimized machine learning techniques are presented toward enhanced SOC estimation in terms of learning capability, accuracy, generalization performance, and convergence speed. We validate the proposed method through lithium-ion battery experiments, EV drive cycles, temperature, noise, and aging effects. We show that the proposed method outperforms several state-of-the-art approaches in terms of accuracy, adaptability, and robustness under diverse operating conditions.

complex mathematical equations, thus leading to complications for battery model development and parameter estimation 18 . On the contrary, the ML-based SOC estimation approaches utilize influx of data and powerful processers to estimate SOC with limited prior knowledge about battery internal characteristics and chemical reactions 19,20 . However, accuracy and performance of the ML methods depend heavily on the quality and amount of the data since unbalanced data would lead to overfitting and underfitting problems 21 .
The scientific innovation of this paper is to introduce an optimized ML technique for SOC evaluation towards the advancement of sustainable EV technologies. ML techniques have received huge attention for their enhanced learning capability, generalization performance, convergence speed, and high accuracy, hence it can be ideal to address the complex and nonlinear characteristics of lithium-ion batteries. However, the hyperparameters selection of ML algorithms by inefficient trial and error leads to computation complexity, such as slow training speed and data fitting problem, thereby delivering unsatisfactory SOC results [22][23][24] . Currently, the optimization techniques have been increasingly popular to achieve high adaptability, improved efficiency, and high-quality results thus can be employed to determine the optimal hyperparameters as well as appropriate training algorithm, and activation function of ML algorithms. Therefore, a proper combination of ML algorithm and optimization technique not only resolves the computational complexity of ML algorithms but also achieves excellent solutions in lithium-ion battery SOC estimation.
In this study, we present a new method for accurate SOC estimation using an ML-based optimization technique. Recurrent nonlinear autoregressive with exogenous inputs (RNARX) neural network algorithm is a well-known subclass of ML algorithm that has been widely used in designing time-series and dynamic systems. The computational capability of RNARX is enhanced by using lightning search algorithm (LSA), thereby increasing SOC estimation accuracy. The results show that the proposed method is accurate and robust because it can accurately examine SOC under different operating conditions. The key contributions of this study are highlighted below: • The proposed RNARX-LSA algorithm does not require an added filter in the data pre-processing steps rather only needs sensors to monitor the battery signals such as voltage, current, and temperature. • The RNARX algorithm updates the learning parameters including weights and bias by self-learning algorithm while using the past and present information of the input layer along with past information of the output layer to examine SOC. In contrast, the model-based SOC estimation is designed based on the deep understanding and knowledge of the lithium-ion battery background processes. • The RNARX-LSA based SOC estimation method does not require the battery model, thus avoiding time and efforts to construct robust rules and mathematical relationships in capturing the battery behavior as well as estimating battery model parameters. • The SOC estimation by traditional RNARX algorithm uses inefficient trial and error method to find the optimal values of hyperparameters which leads to data overfitting or under-fitting problems. Thus, the training operation of RNARX could consume substantial time to find the correct values of hyperparameters. Hence, LSA is combined with RNARX algorithm to find the best values of hyperparameters which eventually improves the accuracy of SOC estimation under changing environmental conditions. • The proposed ML-based SOC estimation is validated by experiments and different EV drive cycles under varying temperatures conditions in order to prove the adaptability and generalization capability. In addition, the accuracy and robustness of the RNARX-LSA model are further verified under different noise effects and aging cycles. The proposed method is suitable for online battery management system (BMS) since the execution of SOC in real-time is extremely fast due to low mathematical complications in the testing stage.

Results
Soc estimation through constant discharge test (cDt). The SOC experimental results under different discharge current rates are presented in this section. The superiority of LSA is compared with three powerful optimization algorithms, namely, backtracking search optimization (BSA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) methods. As shown in Fig. 1   SOC robustness against noise effects. The SOC performance is evaluated against bias noise through experimental tests and EV drive cycles, as shown in Fig. 5. The results show that the RMSE and maximum SOC error in HPPC 0.25 C discharge load profile are computed to be 0.5885% and 4.33%, respectively. The results are reasonable in 1 C CDT, where the RMSE and maximum SOC error are found to be 0.8404% and 4.67%, respectively. The addition of bias noise to EV drive cycles does not deviate the SOC estimation results considerably, where the proposed approach achieves RMSE and maximum SOC error values of 0.8086%, and 3.42%, respectively in DST drive cycle. Likewise, in FUDS drive cycle, RMSE and maximum SOC error values are obtained to be 0.7865% and 3.25%, respectively. The SOC estimation results are satisfactory versus random noise when limiting the SOC error range of ± 5%. The maximum SOC error is under 4% in 1 C CDT and HPPC 0.25 C load profiles. Besides, RMSE is calculated to be 3.47% and 3.51% in 1 C CDT and HPPC 0.25 C load profiles, respectively. The results are suitable in the case of the EV drive cycles, where the maximum SOC error is less than ±   5%. The RMSE in DST and FUDS drive cycles is estimated to be 1.1373%, and 1.0268% respectively. Accordingly, the maximum SOC error is achieved to be 4.88%, and 4.55% in DST and FUDS drive cycles. The SOC estimation results are verified through the combination of bias and random noises. The results indicate that the mixture of bias and random noises has a small impact on SOC estimation in terms of SOC error and RMSE. The maximum SOC error is slightly higher than in the two previous cases although the error remains inside the acceptable range of ±5%. The maximum SOC errors of 4.82% and 4.13% are obtained in 1 C CDT and HPPC 0.25 C load profiles, respectively. Accordingly, the RMSE values are calculated to be 1.1569% and 1.4221%, respectively. The results are satisfactory under EV drive cycles, where the RMSE is 1.2061%, and 1.1306% in DST and FUDS drive cycles, respectively. Consequently, the maximum SOC error is limited to 4.98%, and 4.87% in DST and FUDS drive cycles, respectively. The RNARX-LSA-based SOC estimation method exhibits strong robustness against biased and random noises.
SOC evaluation under aging effects. The proposed method achieves excellent SOC estimation results for a fresh lithium-ion battery. The accuracy of the lithium-ion battery decreases after the battery is cycled for hundreds of times. Hence, the accuracy and robustness of the proposed method are evaluated under different aging cycles. The lithium-ion battery degradation performance is evaluated under four milestone aging cycles, namely, 50, 100, 150, and 200 cycles, as shown in Fig. 6. The cycle life of LiNiCoAlO 2 (LiNCA) battery is obtained to be 85.92% after 200 aging cycles, which reduces by 9.6% compared with the value achieved after 50 aging cycles. Likewise, the capacity is found to be 3052 mAh after 50 aging cycles and reduces to 2763 mAh after 200 aging cycles. RNARX-LSA is trained using the HPPC experimental dataset of a new LiNCA battery, whereas the dataset of aged LiNCA battery for 50, 100, 150, and 200 cycles is used to test the performance of the trained model. comparative validation with the existing methods. The accuracy and robustness of RNARX-LSA method are further investigated by evaluating different SOC error rate terms as depicted in Table 1. The recent and notable studies concerning both traditional and ML-based SOC estimation methods are considered for comparative analysis. The most influential factors related to SOC estimation such as lithium-ion battery type, temperature, load profile are employed to analyze the results. It is observed that RNARX-LSA based SOC estimation method outperforms the existing SOC estimation approaches under different EV drive cycles. For instance, RMSE is estimated to be over 1% in BPNN, ELM, CNN, LSTM, GRU and GFCA methods whereas RMSE is found under 1% in the proposed approach. Apart from ML techniques, the error rates are also high in conventional methods and model-based approaches with RMSE over 1% in OCV, UPF, RLS, and PIO methods. Moreover, MAE is estimated below 0.6% in the proposed method while that for RBFNN, DNN, WNN, and GPR is above 0.7%. The proposed

Discussion
In this article, we validate RNARX-LSA for SOC estimation using the experimental data obtained through CDT and HPPC tests. We use different discharge current rates to evaluate the accuracy of the proposed model. An extensive comparative study between LSA and BSA, GSA, and PSO is performed through the assessment of objective function using the same iterations and population size. The proposed RNARX-LSA provides better results than that of RNARX-BSA, RNARX-GSA, and RNARX-PSO in obtaining the lowest objective function and small SOC error under CDT and HPPC tests. The robustness, adaptability, and efficiency of the proposed model are examined under DST and FUDS EV drive cycles. SOC is evaluated under three different temperatures, namely, 0 °C, 25 °C, and 45 °C. The RNARX-LSA-based SOC estimation method achieves excellent results and delivers minimum SOC error compared with RNARX-BSA, RNARX-GSA, and RNARX-PSO under different EV drive cycles and temperature conditions. The proposed method exhibits better outcomes than that of state-of-the-art optimized ML methods in terms of reducing RMSE and MAE. The robustness of the proposed model is assessed against biased and random noises. The SOC performance is verified under four milestone aging cycles, namely, 50, 100, 150, and 200 cycles. In all test conditions, the developed method achieves satisfactory results. We conclude that RNARX-LSA is demonstrated as a generalized model that can accurately assess the SOC under different operating conditions.

Methods
Soc equation. SOC is calculated by assessing the current capacity divided by the nominal capacity, which is expressed in the following equation 7 : where SOC is the estimated value, SOC 0 is the reference value, C n is the nominal capacity, η is the coulombic efficiency, and i and t denote the battery charging/discharging current and duration, respectively.

experiments and data development. A test bench model was established with lithium-ion battery bat-
teries for data extraction and SOC evaluation. The test bench is divided into two parts, namely, hardware and software parts. The hardware part comprises LiNCA batteries and a NEWARE battery testing system (BTS)−4000.
LiNCA has a rated capacity, nominal voltage, and cut-off voltage of 3200 mAh, 3.6 V, and 2.5 V, respectively. The software part is designed using MATLAB 2015a and a software version 7.6 related to BTS-4000. A host computer was used to collect data from hardware and install the software. The BTS-4000 measurement unit was connected to a NEWARE BTS-4000 control unit through the RS485 port, whereas the control unit was connected to a host computer through a TCP/IP port. The steps of CDT and HPPC tests were executed using the necessary software actions of BTS software. BTS software was used to conduct the battery experimental test at the different charge and discharge current rates. The charging and discharging control of LiNCA battery was operated using the appropriate function of BTS software version 7.6 while satisfying the cut-off current and voltage values instructed by the manufacturer. The experimental dataset, including current and voltage, was recorded in each second and kept in the database storage system of the host computer. Subsequently, the dataset was transferred to MATLAB 2015a software to execute RNARX-LSA algorithm.
training and testing dataset. The entire dataset was divided into two subsets, namely, training and testing subsets. Cross-validation was applied to randomly split the data into training and testing at 70:30 ratio. The efficiency and robustness of the training data of RNARX-LSA can be enhanced through appropriate data normalization. Data normalization can enhance the convergence rate and remove the negative influence. In this study, the input dataset was normalized to a range [−1, 1], as expressed in the following equation 26 , where x max is the maximum value, and x min is the minimum value of input vector x. In this study, the performance goal and the number of epochs were set to 0.000001 and 1000, respectively. The host computer was configured with Core i5 2.3 GHz processor and 12 GB RAM to execute the algorithm.
objective function formulation. The objective function aims to determine the optimal value of hyperparameters of RNARX algorithm through an iterative process which leads to minimum SOC error rates estimation. In this study, RMSE was chosen as the objective function because of the large number of sample variables and randomness behavior of SOC errors 27 . The objective function is formulated using the following equation 28  www.nature.com/scientificreports www.nature.com/scientificreports/ hyperparameters of RNARX algorithm, including IDs, FDs, and HNs. The new updated population of hyperparameters was repeatedly assessed during the iterative process whether they were outside of the boundary region. Otherwise, the outcome of LSA optimization could deviate, thereby delivering poor SOC estimation results. For example, variable X i j k , should be between X i j k , 1 − and X i j k , 1 + . The hyperparameters of RNARX algorithm should be reproduced with the boundary, and the results will be updated accordingly when variable X i j k , is greater than X i j k , 1 + or less than − X i j k , 1 . Therefore, the appropriate limit of the hyperparameters of RNARX algorithm can be expressed follows: RNN is a supervised ML method designed using three layers, namely, input, hidden, and output layers 29 . RNARX is a prominent subgroup of RNN that uses one or more feedback loops to address complex and time-series problems 30 . SOC estimation of RNARX is performed using the present and past values of inputs and estimated past values of outputs. The output of RNARX can be represented as 31 : where b 0 and b h are the biases, w ih , w ho , and w jh are the weights, and f h (.) and f 0 (.) are the activation functions. u 1 and u 2 denote the first and second inputs, respectively, and y represents the output. The hidden layer and output layer operations are executed using logsig and purelin transfer functions, respectively 32 .
Hyperparameter tuning. LSA 33 is used to find the optimal hyperparameters of the RNARX algorithm that induces IDs, FDs, and HNs. LSA uses three particles known as projectiles, such as transition, space, and lead projectiles, to search for optimal solutions. Transition projectiles create the first-step leader population, N, space projectiles attempt to reach the best leader position, and lead projectile represents the best position among N numbers of step leaders. Probability density function f x ( ) T of the transition projectile can be expressed as 34,35 , at step 1 + can be designed in the form of exponential distribution with shaping parameter µ. Probability density function f x ( ) S of a space projectile can be expressed as 34,35 , The revised position of p i S at step 1 + is represented as 34,35 , where exprand represents the exponential random number. The corresponding stepped leader sl i moves toward a new position, sl _ i new , when p _ i new S obtains a satisfactory solution at + step 1 and the capacity of a projectile E _ p i S is greater than the energy of step leader E _ sl i . Otherwise, they remain unmoved until the next step is obtained. The normal probability density function of lead projectile f x ( ) L is demonstrated using the following equation 34,35 , The revised location of p L at step 1 + can be represented as 36,37 ,  6 , and PSO 38 using the same population size (50) and iteration numbers (500) to ensure a fair assessment. In LSA, channel time was counted as 10. In GSA, gravitational constant G 0 and acceleration α were set 100 and 20, respectively. In PSO, acceleration coefficients c 1 , c 2 , and weight factor w were assigned to 2 and 0.5, respectively. The hyperparameters of BPNN 39  www.nature.com/scientificreports www.nature.com/scientificreports/ algorithms were optimized using LSA to conduct a fair comparative analysis. In the BPNN algorithm, LSA was used to find the optimal number of HNs and learning rates. In the RBFNN algorithm, the number of neurons, spread, and width values was optimized using LSA. The optimal number of neurons was obtained using LSA in the ELM algorithm. For DRNN, the number of hidden layers and HNs was optimized using LSA. The best values of trees and leaves were achieved using LSA in the RF algorithm.

SOC effectiveness measures.
The performance of RNARX-LSA-based SOC estimation was verified using different error rate terms. The mathematical equations of these statistical errors are expressed as follows 36  where SOC a is the reference value, SOC es is the estimated value, SOC error is the average value of SOC error and n is the number of data observations. The reference SOC is obtained using (1).
implementation of RnARX-LSA based Soc estimation algorithm. The execution of the RNARX-LSA algorithm for SOC estimation started with the measurement of battery data including current and voltage from CDT and HPPC experimental tests. After, IDs, FDs, and HNs of RNARX were optimized through the LSA method based on the minimum value of the objective function. The proposed SOC estimation model was then processed into various validation tests to check the model accuracy and robustness under different operating conditions. The SOC estimation results were evaluated using different error rate terms and compared with different optimization techniques and ML approaches. The methodological framework of the proposed RANRX-LSA is illustrated in Fig. 7. The overall implementation procedures are categorized into three stages. In stage I, the CDT and HPPC battery experimental tests were carried out by developing a test bench model. After, the corresponding dataset was generated including current and voltage from the test bench platform. At the same time, the EV dataset including current, voltage, and temperature was also collected. Then, the data were pre-processed and normalized in order to improve the training speed. Finally, the data partition was performed for algorithm training and testing.
In stage II, the LSA started with assigning the parameters such as population size, iteration number, dimension, input variables, objective function, and optimization constraints. Then, the position of step leader was generated randomly and the objective function was evaluated. After, the channel time was reset by eliminating the bad channel from worst to best. Next, space projectile and lead projectile were ejected and their positions were verified based on the objective function. Subsequently, the location of the projectile was updated if the energy of the projectile was higher than the step leader. After, the population of hyperparameters was reinitialized within the boundary limit. The process continued until it reached the maximum iteration. Finally, the optimal values of hyperparameters were sent to RANRX algorithm and accordingly RNARX training operation was executed using the Levenberg-Marquardt (LM) algorithm and RANRX activation function.
In stage III, SOC was estimated and results were verified using different performance indicators such as RMSE, MSE, MAE, MAPE, SD, and SOC error. Subsequently, a comprehensive comparative analysis was performed with well-known optimization approaches and machine learning methods. Finally, the robustness of SOC was assessed under different temperatures, noise effects and aging cycles. Figure 1 methods. The CDT experiment 45,46 started with the charging of LiNCA battery completely using constant current constant voltage (CC-CV) method. A CC of 1.6 A (0.5 C) current was applied until the charge voltage reached 4.2 V. Then, a CV of 4.2 V was employed until the charge current dropped to 0.064 A (0.02 C). Subsequently, the battery was kept idle for 1 h. Next, the discharged current of 1.5 C/1 C /0.5 C was operated until the discharge voltage declined to 2.5 V. The test ended when the battery voltage reached 2.5 V. Otherwise, the battery was discharged again at 1.5 C/1 C /0.5 C. Figure 2 methods. The HPPC test 47,48 was executed by generating a combination of charge and discharge current pulses in an orderly manner. The customized HPPC was designed using different charge and discharge current values to verify the robustness of the proposed method. Initially, the battery was charged using CC method with 1.6 A (0.5 C) current until the charge voltage reached 4.2 V. Then, the battery was charged using CV method with 4.2 V until the charge current dropped to 0.064 A (0.02 C). Subsequently, the battery was discharged at 0.5 C/0.3 C/0.1 C for 10 s followed by a rest period of 3 min. Next, the battery was charged at 0.5 C/0.3 C/0.1 C for 10 s followed by a rest period of 3 min. After, the battery was discharged at 0.25 C/0.1 C/0.07 C for 24/60/86 min to Scientific RepoRtS | (2020) 10:4687 | https://doi.org/10.1038/s41598-020-61464-7 www.nature.com/scientificreports www.nature.com/scientificreports/ decrease the SOC by 10%. The test ended when the battery reached 2.5 V. Otherwise, the battery was discharged again at 0.5 C/0.3 C/0.1 C. Figures 3 and 4 methods. EV drive cycle data were collected from the Center for Advanced Life Cycle Engineering (CALCE) 49 battery research group. An 18650 NMC cathode-based lithium-ion battery cell with a nominal capacity of 2.0 Ah and a voltage of 3.6 V was used for SOC estimation. Two different patterns of EV drive cycles, namely, DST and FUDS, were utilized to evaluate SOC performance, as depicted in Figs. 4 and 5, respectively. These drive cycles have diverse current profile in terms of different amplitudes and time durations. The duration of one cycle for DST and FUDS is 360 and 1372 s, respectively 50 . DST corresponds to dynamic charging and discharging, whereas FUDS is related to urban driving. A thermal chamber was used to control the battery temperature. The experiments were conducted at three different temperatures of 0 °C, 25 °C, and 45 °C. Figure 5 methods. An EV is designed using many sensors and power converters. Electromagnetic interference (EMI) noises are generated when the power converter switching is operated at high frequency, which may add to the measured current and voltage values. Each sensor of EV experiences equipment errors, thereby resulting in error of measured current and voltage signals. Therefore, SOC should be examined against bias and random noises, where bias noise corresponds to the sensor precision, and random noise is related to EMI noises. The robustness of the proposed method was checked under positive bias noises by injecting 0.1 A and 0.01 V to the current and voltage measurements, respectively 51 . In addition to biased noises, a standard random noise with an amplitude of 0.1 A and 0.01 V was added to current and voltage measurements 52 . Figure 6 methods. Battery aging is important to determine the battery performance after certain aging cycles. The battery capacity decreases with the increase in aging cycles. Firstly, cycle life of LiNCA battery was monitored under different aging cycles. The cycle life was calculated using the current capacity of an aged LiNCA battery cell divided by the capacity of a fresh LiNCA battery cell 53 . The aging operations of LiNCA battery initiated with CC-CV method. The battery was charged until it reached 4.2 V with a current of 1.6 A (0.5 C). Subsequently, the current reduced to 0.064 A (0.02 C), whereas 4.2 V remained constant. The battery was discharged at 1 C (3.2 A) current until the battery voltage reached 2.5 V. One aging schedule was completed when the battery reached 2.5 V. After completion of one aging cycle, the battery was rested for 1 h 54,55 . The process continued for 50, 100, 150, and 200 cycles.