Data-driven capacity estimation of commercial lithium-ion batteries from voltage relaxation

Accurate capacity estimation is crucial for the reliable and safe operation of lithium-ion batteries. In particular, exploiting the relaxation voltage curve features could enable battery capacity estimation without additional cycling information. Here, we report the study of three datasets comprising 130 commercial lithium-ion cells cycled under various conditions to evaluate the capacity estimation approach. One dataset is collected for model building from batteries with LiNi0.86Co0.11Al0.03O2-based positive electrodes. The other two datasets, used for validation, are obtained from batteries with LiNi0.83Co0.11Mn0.07O2-based positive electrodes and batteries with the blend of Li(NiCoMn)O2 - Li(NiCoAl)O2 positive electrodes. Base models that use machine learning methods are employed to estimate the battery capacity using features derived from the relaxation voltage profiles. The best model achieves a root-mean-square error of 1.1% for the dataset used for the model building. A transfer learning model is then developed by adding a featured linear transformation to the base model. This extended model achieves a root-mean-square error of less than 1.7% on the datasets used for the model validation, indicating the successful applicability of the capacity estimation approach utilizing cell voltage relaxation.


Supplementary Note 2: Discussion of data splitting methods on the base model
Four splitting strategies (A, B, C, and D) are compared, and the results are exhibited as follows. The training dataset and test dataset are in a 4:1 ratio, and 5-fold cross-validation is used to determine the hyperparameters of the model in the training process.
A. Temperature dependence splitting: The train and test results using the temperaturebased data splitting method is shown in Supplementary C. Random splitting method: All data are put together for random sampling without distinguishing the working conditions. Considering the variation of the data units under different cycling conditions, the weighted average method achieves similar estimation accuracy, presenting the effectiveness of the random data splitting method without data balance.
D. Cell stratified sampling on the working conditions: A stratified sampling method is used to select the cells in each working condition, meaning that the data from the same cell is either in the training set or in the test set. The cell splitting is approx. 4:1 for training and testing as presented in Supplementary Table 9. The result is quite good reaching 1.1% for the XGBoost and SVR methods.
By comparing the above data splitting methods, we find that the random splitting and the cell stratified sampling methods show similar and good test RMSEs. The temperature dependence splitting method is the worst. One possible reason is the unreasonable splitting of data amount between train and test data, for example, the ratio of data amount is 2:5 around (25 o C and 35 o C for training and 45 o C for testing). Another reason is that the machine learning algorithm is highly affected by working conditions, i.e., the model does not have the ability of "zero-shot learning" in the absence of the working condition in the training process. The time-series-based splitting method is also not ideal, meaning that a full range degradation dataset is necessary to train the model. This is due to the nonlinear attenuation of the battery and the strong influence of the working condition on battery degradation factors. Table 2 (1) Rest voltage based -Linear model:

Supplementary Note 3: Description of benchmarking methods in
The presented method in Ref. 2 is to study the change of Urelax with battery degradation. Urelax is defined as the open-circuit voltage of the battery after 30 min of rest after full charging. The relationship between Urelax and capacity in dataset 1 is shown in Supplementary Figure 4. A linear model is trained based on a randomly selected 80% of the target dataset, and the remaining 20% of the data are used to test the model performance. The result shows that the proposed linear model achieves a RMSE of 2.5%.
(2) CC charge voltage based -RFR: The presented method is based on the dependency of the battery capacity on the features extracted from the partial CC charge curve. The selected voltage range from 3.6 V to 3.8 V is selected according to Ref. 3 , which is also corresponding to the middle SOC range in our work. For each charge curve from 3.6 V to 3.8 V in dataset 1, an interval of 2 mV is used to discretize it into 101 data points (V0, V1, …, Vk …, V100). The features extracted as input are the relative capacity values Qk (k = 0, 1, …, 100) at each voltage point, where Qk is calculated based on coulomb counting by integrating the current with the time that the battery charged from V0 to Vk. The initial capacity Q0 is defined as 0. An example of feature extraction for one voltage curve is shown in Supplementary Figure 5. A random forest regressor is trained to map the sequence of the relative capacity values Qk and battery capacity based on the randomly selected 80% dataset. For the hyperparameters of the random forest regression, the number of trees is chosen as 6 to be consistent with XGBoost and the number of random features for each split is chosen to be one-third of the number of variables. The prediction result on the remaining 20% dataset shows that the proposed method achieves a RMSE of 1.0%.
(3) ICA transformation -Linear model: The method in Ref. 4 is to estimate the battery capacity based on remaining charge electricity (RCE), which is obtained by incremental capacity analysis (ICA) on battery charging voltage. A threshold is set according to the ICA value. Specifically, An ICA curve of one cell in dataset 1 is illustrated in Supplementary Figure 6a, in which the dashed line (dQ/dV = 2.5 mAh/mV) is defined as the threshold. The partial charged capacity from the threshold till to the end of the charge is counted as RCE. The relationship between RCE and battery capacity for all cells in dataset 1 is shown in Supplementary Figure 6b. A linear model is trained on the randomly selected 80% data samples, and the prediction performance on the remaining 20% samples shows that the proposed model achieves a RMSE of 1.3%. (4) CC-CV charge voltage based -GPR: The presented method in Ref. 5 estimates battery capacity using four specific features (F1, F2, F3, and F4) extracted from the CC-CV charge curve as shown in Supplementary Figure 7. F1 is the time of CC model duration, F2 is the time of CV mode duration, F3 is the slope of the curve at the end of CC charge mode and F4 is the vertical slope at the corner of the cc charge curve. A Gaussian process regression model with radial basis function (RBF) kernel and white noise kernel is trained based on the 80% dataset in dataset 1 and the remaining 20% dataset is used for model testing. The result shows that the proposed model achieves a RMSE of 1.1%.

Supplementary Note 4: Discussion of data selection strategies for the transfer learning retraining
For the transfer learning on dataset 2 and dataset 3, several data selection strategies (A, B, C, and D) on the TL2 are used and the results are compared in Supplementary  Table 13. A. Data selection according to the time-series data: 1% data are used in A1, and 10% data are used in A2. The RMSEs for the TL2 are quite large, illustrating the inappropriateness of the time-series-based splitting method in our study.
B. 1% random data: 1% of the target dataset from dataset 2 and dataset 3 are randomly set as the input variable to train the transfer learning model. 1.3% and 1.6% RMSE are obtained by SVR on dataset 2 and dataset 3 respectively.
C. Random cells from each working condition: A cell is randomly selected from each cycling condition, meaning that the data of three random cells corresponding to temperatures from dataset 2 and the data of three random cells corresponding to discharge rates from dataset 3 are used. A 1.4% RSME is achieved on dataset 2, and 1.6% RMSE is obtained on dataset 3. It is noted that the amount of input variable is approximately 7% for dataset 2 and 33% for dataset 3. Thus, a reduction of data volume is performed in Strategy D for a fairer comparison.
D. Reduction of data volume from the randomly selected cells. The data volume for the TL model re-training is reduced according to the battery cycle numbers. Data units are chosen from a randomly selected cell (as presented in Strategy C) with an interval of 100 cycles as the input variables. The sizes of the selected data units are summarized in Supplementary Table 14. It can be seen that the TL2 shows 1.7% RMSE on dataset 2, and 1.6% RMSE on dataset 3, respectively, proving the effectiveness of the transfer learning.
In a summary, we find that the model using strategies B, C, and D achieves good estimation accuracy. We speculate that these data selection methods contain the effect of working conditions, meaning that the diversity of working conditions is important to improve the model generalization as more input of the working conditions projects to more battery degradation pathways. The discussion of data splitting methods on the base model and transfer learning model proves that the generalization of the model is highly related to the working conditions of the battery.

ElasticNet method
The ElasticNet algorithm is proposed by Zou et al. 6 , which is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. ElasticNet is an extension of ordinary least square (OLS) regression. In OLS regression, given d features, xi1, …, xid, the response yi is predicted by: (1) A model fitting procedure produces the parameter vector β=(β 0 , …, β d ).
For the data set having n observations with p features, let y=(y 1 The ElasticNet loss function is defined as： If we set α=λ 2 /(λ 1 +λ 2 ), the optimized parameters vector is obtained by: where ‖ ‖ 2 2 + (1 − )‖ ‖ 1 is called the ElasticNet penalty, which is a convex combination of the lasso and ridge penalty.

XGBoost method
The XGBoost method 7 is a scalable end-to-end tree boosting system designed to be highly efficient, flexible, and portable. It implements machine learning algorithms in the Gradient Boosting framework. Compared with multiple linear regression, XGBoost has the advantage of being able to handle nonlinear relationships. The tree f(x) is defined as: where t represents a tree, q represents the structure of each tree that maps an example to the corresponding leaf index. T is the number of leaves in the tree. Each f t corresponds to an independent tree structure q and leaf weights ω (output of a tree). The objective function is defined as: where is a differentiable convex loss function that measures the difference between the prediction ŷ and the target yi. The second term Ω penalizes the complexity of the model, which helps to smooth the final learned weights to avoid over-fitting.
where ωj is the weight of the j th leaf node. γ and λ are the coefficients for penalty term Ω.
Using the second-order Taylor's formula, the objective function can be given as: where xi is the input of the sample, After removing the constant, the objective function at step t becomes The optimal weight ω j * of leaf j for a fixed structure q(x) can be computed by: The optimal loss is: obj * is a function of marking tree structure and measuring the quality of tree structure q. The smaller the value of obj * , the better.

SVR method
SVR approach 8 is a kernel-based method that does not regress on the original input vector, but on its nonlinear expansion, which is mapped from a kernel function to a very high-dimensional feature space. Given a training set of data {(x1, y1),…, (xn, yn)}, where x i ⊂R d donates the input space of the sample, yi⊂R is the target value. i=1, …, n, corresponds to the size of the training data.
The generic SVR estimating function takes the form where ω⊂R d , b⊂R, and Φ(x) is a nonlinear transformation from R d to a highdimensional space. The ω has the following expansion: where α i and α i * are the Lagrange multiplier. With the expression of the kernel function k(x i ,x)=Φ(x i )•Φ(x), the SVR estimating function can be expressed as: The goal of SVR is to find the value of ω and b that minimizing the total loss where C is a constant, and vector l ϵ is the loss function, the ϵ-insensitive loss function is used in this research: The impedance spectra in a are all tested results using a "line +scatter" plots for visualization. R0 is the real part of the impedance at zero crossing. R1//CPE1 in parallel denotes the migration of lithium ions through the solid electrolyte interphase in the high frequency range. The semi-circle in the medium frequency range is accounted for the charge transfer process, and modeled as R2 in parallel with CPE2 (R2//CPE2). The low frequency slope is associated with Warburg impedance (W). The fitting coefficient of determination (R 2 ) between the raw and fitted impedance data is summarized in Supplementary Table 12. All the raw impedance data and fitted data are shared in the data availability (https://doi.org/10.5281/zenodo.6379165).
Supplementary Figure 9 Illustration of the implemented transfer learning process. TL1 (a) and TL2 (b). Variance (Var), skewness (Ske), and maxima (Max) are the input features. SVR means Support Vectors Regression.
Supplementary Figure 10 Test results of estimated capacity versus real capacity by transfer learning. Results of ZSL embedding XGBoost method (a) and embedding SVR method (b) on dataset 2. Results of ZSL embedding XGBoost method (c) and embedding SVR method (d) on dataset 3.
Supplementary Figure 11 Test results of estimated capacity versus real capacity by transfer learning. Results of No TL embedding XGBoost method (a) and embedding SVR method (b) on dataset 2. Results of No TL embedding XGBoost method (c) and embedding SVR method (d) on dataset 3. The poor performance of SVR on No TL is limited using the "radial based function" kernel in the SVR model. The performance of using a "linear" kernel instead of the "radial based function" kernel in the SVR model is better as shown in Supplementary Table 15, achieving a better estimation accuracy.
Supplementary Figure 12 Test results of estimated capacity versus real capacity by transfer learning. Results of TL1 embedding XGBoost method (a) and embedding SVR method (b) on dataset 2. Results of TL1 embedding XGBoost method (c) and embedding SVR method (d) on dataset 3.
Supplementary Figure 13 The schematic connection of the potentiostat, chamber, and cells. For the NCA and NCM batteries, the metal taps are spot-welded to the cells, and the contact is soldered to the metal taps. A four-wire holder is used for the NCM+NCA battery.
Supplementary Table 1 A summary of the typical capacity estimation methods. It is noted that the table only lists the estimation accuracy of these methods on their specific data (as marked as the "battery dataset" column). Because the performance of the machine learning model is dependent on the quality and quantity of input data, they cannot be compared directly unless using the same data. The data marked by the asterisk (*) are the publicly available datasets. CALCE data*: https://web.calce.umd.edu/batteries/data.htm; LFP data*: https://data.matr.io/1/projects/5c48dd2bc625d700019f3204; NASA data*: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/; Oxford data*: https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac; Supplementary LiFePO4/C, LiFePO4 is the positive electrode.   Table 5 Statistical features extracted from one voltage relaxation curve, x i is the battery terminal voltage, i=1, …, n, n is the number of samples in one relaxation curve Excess Kurtosis (Kur) Supplementary  Table 12 The coefficient of determination (R 2 ) between the raw and fitted impedance data. All the raw impedance data and fitted data are shared in the data availability (https://doi.org/10.5281/zenodo.6379165). The R 2 marked with * means that no R1 and CPE1 (shown in Supplementary Figure 8b)