The impact of physicochemical features of carbon electrodes on the capacitive performance of supercapacitors: a machine learning approach

Hybrid electric vehicles and portable electronic systems use supercapacitors for energy storage owing to their fast charging/discharging rates, long life cycle, and low maintenance. Specific capacitance is regarded as one of the most important performance-related characteristics of a supercapacitor’s electrode. In the current study, Machine Learning (ML) algorithms were used to determine the impact of various physicochemical properties of carbon-based materials on the capacitive performance of electric double-layer capacitors. Published experimental datasets from 147 references (4899 data entries) were extracted and then used to train and test the ML models, to determine the relative importance of electrode material features on specific capacitance. These features include current density, pore volume, pore size, presence of defects, potential window, specific surface area, oxygen, and nitrogen content of the carbon-based electrode material. Additionally, categorical variables as the testing method, electrolyte, and carbon structure of the electrodes are considered as well. Among five applied regression models, an extreme gradient boosting model was found to best correlate those features with the capacitive performance, highlighting that the specific surface area, the presence of nitrogen doping, and the potential window are the most significant descriptors for the specific capacitance. These findings are summarized in a modular and open-source application for estimating the capacitance of supercapacitors given, as only inputs, the features of their carbon-based electrodes, the electrolyte and testing method. In perspective, this work introduces a new wide dataset of carbon electrodes for supercapacitors extracted from the experimental literature, also giving an instance of how electrochemical technology can benefit from ML models.


Introduction
Electrochemical capacitors (supercapacitors) are electrochemical devices that are extensively used for energy storage due to promising characteristics such as high-power density, electrochemical stability, fast charge/discharge rates, safe operation mode, high power density, and long cycle life. 1- 3 These characteristics enable their use in a broad range of energy storage applications, e.g., for hybrid electric vehicles, portable electronics, and memory backup systems. 4-6Other than energy storage, there are some other interesting applications of supercapacitors such as heat-to-current conversion of low-grade thermal energy 7 and renewable energy extraction using a supercapacitor from water solutions. 8Supercapacitors have been primarily classified into two types based on their charge storage mechanism: (i) electric double-layer capacitors (EDLCs), which store electrical charge via ion adsorption at the electrode surface, and (ii) pseudo-capacitors, which store charges via reversible Faradaic redox reactions (see Figure 1).Generally, EDLCs have superior cycle stability but lower specific capacitance in comparison to pseudo-capacitors, which have a high specific capacitance but a low power density and poor cycle stability instead. 9,10The current study focuses on EDLC supercapacitors and their optimization.For the optimization of the electrochemical performance of EDLC supercapacitors, it is critical that the electrode materials have commendable physicochemical properties, including appropriate pore size distribution, high specific surface area, high electrical conductivity, as well as electrochemical and mechanical stability for good cycling performance. 9Numerous materials have been synthesised and used as supercapacitor electrodes in recent years, including porous carbon, 11-13  hierarchical porous carbon, 14-16 activated carbon, 10,17,18 graphene, 9,19,20 rGO-PANI nanocomposite, 21 carbon nanotubes, 22-24 In2O3-loaded porous carbon, 25 and carbon aerogels. 26,27Among these electrode materials, carbon is the most frequently used due to its versatility and uniqueness. 28It exists in various forms (e.g., graphite, diamond), dimensionalities (fibres, fabrics, foams, and composites), ordered and disordered structures (depending on the degree of graphitization), with commendable electrical conductivity. 29,30The catalytic, optical, mechanical, and electrochemical properties of carbon make it an excellent material for energy conversion and storage applications. 31Additionally, its well-established synthesis and activation methods enable its use as an electrode in supercapacitors with an appropriate pore size distribution. 32    Porous and activated porous carbon (AC) and hierarchical porous carbon (HPC) have been proposed in the literature for carbon electrodes (see Supplementary Note 1 for a detailed review).Because of its high specific surface area (SSA), improved electrical conductivity, adjustable pore sizes, electrochemical stability and low cost, porous carbon offers significant promise for use as the electrode. 33-36These properties make AC an excellent material for a variety of applications, including water purification, gas separation and storage, and electrode materials for capacitors, fuel cells and batteries. 37Differently from AC, hierarchical porous carbon material contains pores in a wide range of length scales, namely macro-(>50 nm), meso-(2-50 nm), and micro-(<2 nm) scales.The presence of macropores in HPC allows high-rate ion transport and acts as an ion reservoir.Furthermore, the interconnected mesopores provide low resistance pathways for the diffusion of ions; whereas the high SSA of micropores enhances the adsorption of ions at the pore surface. 38These unique properties of HPC gained recent interest in the selection of electrode materials for supercapacitors.
Besides the SSA and pore volume, there are also several other factors that influence performance of electrodes in supercapacitors, such as surface functional groups and conductivity.These can be modified by introducing heteroatoms -HA (nitrogen, oxygen, sulphur, etc.) in the carbon electrodes, which do not only enhance the wettability, but also improve electronic conductivity of activated carbon. 39Nitrogen doping on carbonaceous material as electron donor is useful for enhancing the specific capacitance via faradaic reaction and enhancing wettability. 16Similarly, oxygen doping improves the surface wettability, which in turn improves the supercapacitor performance. 39Sulphur doping on carbonaceous material increases its bandgap, thus enhancing the electron donor properties and changing the electronic density of state.Sulphur doping also increases wettability, which in turn decreases the diffusion resistance that occurs between the electrode and electrolyte ions. 40 In perspective, graphene is a promising electrode material for supercapacitors too, due to a high electrical conductivity, high SSA, and excellent mechanical strength. 41,42Its porous structure also facilitates charge transport in the supercapacitor.The SSA of graphene is highly tuneable according to the requirement of supercapacitor electrode for energy storage applications.Also, the presence of highly movable free  electrons on its orbital are responsible for the exceptionally high electrical conductivity. 41Furthermore, the electrical behaviour of graphene can be improved through functionalization 43 and heteroatom doping. 44 Numerous attempts have been made to increase the specific capacitance of supercapacitors by utilizing different types of carbon electrodes with varied pore size distributions, high specific surface area, diverse morphologies, and modified surface chemistry. 45However, the influence of these physicochemical parameters on the specific capacitance of supercapacitors has not been completely understood.Additionally, conventional theories and models are incapable of capturing with sufficient accuracy the microscopic details of the underlying physical mechanisms affecting ion transport, which are essential for accurately predicting the capacitive performance of supercapacitors.Recent advances in machine learning (ML) algorithms and their application to physics-based systems have made it possible to recognize the effects of various physicochemical features of carbon-based electrode materials in enhancing the specific capacitance of supercapacitors.In detail, Zhu et al. 46  used artificial neural network (ANN) algorithm to predict the specific capacitance of carbon-based supercapacitors.They collected 681 data entries from the published experimental papers, with information about specific surface area, pore size, presence of defects, nitrogen doping level, and potential window.The authors concluded that ANN yields better predictability of specific capacitance than linear regression and Lasso methods.However, the ANN method could not discriminate the impact of each feature separately.Instead, Su et al. 47 interpolated the specific capacitance of carbon-based electric double layer capacitors using four different ML models, namely linear regression (LR), support vector regression (SVR), multilayer perception (MLP), and regression tree (RT).The authors ranked the performance of the different ML models as follows: RT > MLP > SVR > LR.They found out that the specific surface area, potential window, and heteroatom doping enhance the specific capacitance of EDLC supercapacitors.Nevertheless, the authors did not analyse the effect of pore volume and size of electrodes.Finally, Zhou et al. 48proposed a ML model to determine the features with stronger impact on the specific capacitance and power density of supercapacitors, limiting their analysis only to activated carbon materials for the electrodes and a 6 M KOH electrolyte.
Although some data-driven analyses of the relation between a few features of supercapacitors and their specific capacitance have been reported in previous studies, a comprehensive study on more physicochemical features, electrode materials, methods of testing and electrolytes has been hindered by the limited number of entries in the considered database.In this work, we first created a larger dataset by extracting data from 147 experimental research articles on supercapacitors comprising carbon-based electrodes.The resulting curated dataset is made of 4899 entries and primarily contains information about the specific surface area, the presence of defects, the pore volume and size of pores, the potential window, the current density as well as the nitrogen and oxygen content of the carbonbased electrode materials.Additionally, the importance of categorical variable such as testing method, electrolyte, and carbon structure of the electrode on the specific capacitance was studied for the first time.ML algorithms were then applied to this dataset to identify those characteristics of the electrode material that significantly affect their capacitive performance, and to develop the best model possible for predicting the specific capacitance of supercapacitors.To ease the transferability of results, we developed SUPERCAPs, an open-source software to estimate the specific capacitance of carbonbased EDLC according to the structural features of electrodes, the electrolyte solution and method of testing.

Dataset creation
To develop the dataset, we extracted information from 147 research articles on carbon-based electrode supercapacitors, collecting 4899 data entries (see Supplementary Dataset and the Supplementary Note 2 for a detailed list of data sources).Each data entry includes information related to carbon electrodes (i.e., pore size, pore volume, etc.), the test system (i.e., electrolyte, potential window, current densities), and the resulting specific capacitance.The latter is defined as  =   , where, C, ε, S, and d, are the specific capacitance, permittivity of electrolyte, surface area of electrode-electrolyte interface, and charge separation distance, respectively.
The various parameters included in the dataset that characterize the electrodes and the test system are as follows: 1. Specific surface area (SSA, [m 2 /g]).The specific capacitance of EDLC supercapacitors depends on the adsorption of electrolyte ions on the electrode surface and directly depends on the surface area of the electrode material.Thus, to enhance the specific capacitance, a high specific surface area of electrode material is preferable. 1,292. Pore size (PS, [nm]).The presence of micro/mesopores in carbon-based electrodes provides efficient pathways for the electrolyte ions transports, which leads to rapid ionic diffusion in the supercapacitor. 49-51    3. Pore volume (PV [cm 3 /g]).This feature is related to PS, with an additional normalization with respect to the mass of the electrode.

Ratio between D and G peaks (ID/IG, [-]). The high ratio of intensities between peaks D and
G represents the increase in defects, which leads to a decrease in the electrical conductivity of carbon-based electrodes.The decrement in the electrical conductivity of electrode material affects the capacitive performance of the supercapacitor. 49 5. Nitrogen content in the electrode (N%, [%]).The nitrogen doping in the carbon matrix electrode material improves the specific capacitance by Faradaic reaction.It does not only enhance the charge mobility on carbon surfaces, but it also increases its wettability. 16,526. Oxygen content in the electrode (O%, [%]).The oxygen content in the electrode material improves the wettability of the electrode surface in the electrolyte, which enhances the electrochemical performance of the supercapacitor. 39

Sulphur content in the electrode (S%, [%]
). Sulphur is most reactive element among heteroatom doping elements due to its unpaired electrons and wider bandgap.Increased specific capacitance results from the sequence of Faradaic reactions on sulphur doped carbonaceous materials. 40

Potential window (PW, [V]
). Potential window is a range of potentials in which no Faradaic reaction occurs, implying that material and electrolyte are stable when the potential is applied in this range.It is dependent on the type of material and electrolyte.

Current density (I, [A/g]).
In porous carbonaceous electrodes, ions of the electrolyte do not have sufficient time to reach the microporous surface of the electrode at a high current density due to a fast-charging rate.Therefore, increasing current density degrades the capacitive performance of the supercapacitor. 53,54  Furthermore, the following categorical variables have been also included in the dataset: 1. Electrolyte type.The type of electrolyte is crucial to supercapacitor performance.A good electrolyte has a broad potential window, strong electrochemical stability, high ionic concentration, and conductivity.Electrolytes are classified into three types: aqueous, organic liquid and ionic liquid. 55 2. Method of testing specific capacitance.Two-electrode and three-electrode method are the two methods for evaluating the specific capacitance.The two-electrode method consists of working and counter electrodes, where the potential is supplied, and the resultant current is obtained at either working or counter electrode.The three-electrode system consists of working electrode, counter electrode, and reference electrode.The reference electrode serves as a reference for measuring and adjusting the working electrode potential, without transmitting any current.3. Electrode structure.AC, HPC, and heteroatom (HA)-doped electrodes have a significant effect on their performance, as comprehensively discussed in the Supplementary Note 1.
Figure 2 (a)-(g) shows the influence of the various physicochemical parameters on the specific capacitance of supercapacitors at a current density of 1 A/g for the whole dataset (note that, due to lack of data at 1 A/g, the relationship between specific capacitance and sulphur doping percentage is not presented); whereas Figure 2 (h) shows the relationship between specific capacitance and different current densities.While SSA shows a certain correlation with  in agreement with previous results, 1,29 the other features of carbon-based supercapacitors have a less clear and nonlinear influence on the specific capacitance, therefore requiring advanced data analysis tools to fully understand it.Prior to applying the regression algorithms to the dataset containing the physicochemical properties and specific capacitance of carbon-based electrodes, it is necessary to pre-process the data to remove possible gaps and outliers.The process of improving data quality is known as data curation, and it entails the following activities: 56  • Data integration.The raw data entries in the dataset were derived from various research articles that use a variety of physical units to represent parameters (for example, SSA can be expressed in m 2 /g or in cm 2 /g).We maintained consistent physical units across the dataset and converted them whenever needed.• Outlier detection.The dataset was analysed to identify missing values, erroneous values extracted from research articles, or values that are incorrectly formatted, which could skew the results.Once data curation is completed, the clean dataset (4538 data entries) can be used to train and test the regression algorithms.

Regression model and metrics
Five approaches were adopted to carry out the regression of the target specific capacitance from the physicochemical features of the supercapacitors (see Supplementary Note 3 for details), namely the Ordinary Least Square Regression (OLS) method and four ML approaches: Support Vector Regression (SVR); Regression Decision Tree (DT); Random Forest Regression (RF); Extreme Gradient Boosting Regression (XGBoost).
OLS is one of the most common regression models, where the unknown parameters of linear regression are estimated by lessening the sum of the squares of the differences between the target responses of the sample data and the value foreseen by a linear function of explanatory variables. 57 SVR is a well-established supervised machine learning approach for predicting discrete values.SVR operates on the same principle as Support Vector Machine (SVM).The primary principle of SVR is to determine the best fit line.Support vectors are the results of ideal hyperplanes, which classify unseen datasets that support hyperplanes. 58SVM defines an optimal hyperplane as a discriminative classifier, whereasin SVRthe best fit line is the hyperplane with the most point.The hyperplane in a two-dimensional region is a line separating into two segments wherein each segment is placed on either side.For instance, multiple line data classification can be done with two distinct datasets (i.e., green and red) and used to propose an affirmative interpretation.However, selecting an optimal hyperplane is not an easy job, as it should not be noise sensitive, and the generalization of datasets should be accurate. 59Pertinently, SVM is used to determine the optimized hyperplane that provides considerable minimum distance to the trained dataset. 58,59SVR attempts to minimize the difference between the real and predicted values by fitting the best line under a certain threshold value.The distance between the hyperplane and the boundary line is the threshold value. 60 DT constructs the regression or classification models based on the data features in the tree's configuration.In a tree, every node is related to the property of a data feature.Moreover, it either predict the target value (regression) or predict the target class (classification).The closer the nodes in a tree are, the greater their influence. 61Some benefits of the DT include the capability of handling both categorical and numerical data.
RF is an ensemble learning technique that can perform both regression and classification tasks utilizing the multiple decision trees.During training, the algorithm generates a large number of decision trees using a probabilistic scheme; 62 every tree is trained on a bootstrapped sample of the original training data and finds a randomly selected subset of the input variables to determine a split (for each node).Every tree in the RF makes its own individual prediction or casts a unit vote for the most popular class at input x.These predictions are then averaged in case of regression or the majority vote determines the output in case of classification. 62The core concept is to use numerous decision trees to determine the final output rather than depending on individual decision trees.
XGBoost is one of applications of gradient boosting machines mainly designed for speed and performance in supervised learning.In supervised learning, various features in the training data are utilized to predict the target values.XGBoost applies the tree algorithms to a known dataset and categorises the data accordingly. 63In this model, decision trees are constructed sequentially.Weights are very significant in XGBoost: they are assigned to all the independent variables which are then input into the decision tree which determines the outcomes.The weight of variables predicted wrong by the tree is increased and these variables are fed to the second decision tree.These distinct classifiers are then combined to form an efficient and precise model.XGBoost can be used for both classification and regression problems. 64 To predict the performance of the regression models, the  predicted results (  ) were compared to the original ones (  ̂) using the following metrics: 65 • Root mean square error (RMSE): A RMSE value closer to zero denotes a better prediction.
• Coefficient of determination (R 2 ): where  ̅ is the mean of   ̂ values.An  2 value closer to one represents better prediction.• Bias factor ( ′ ): . ( The value predicted by the model is unbiased if  ′ = 1. • Mean absolute percentage error (MAPE): A MAPE value closer to zero denotes a better prediction.

Correlation analysis
The correlation analysis is a statistical technique used for determining the strength of a relationship between a pair of parameters (variables). 66To estimate the correlations between each pair of parameters, the Spearman's rank correlation coefficient (  ) was used, in order to encompass nonlinear relations as well. 67The correlation (absolute values of   ) between the various supercapacitor's parameters (i.e., possible descriptors) and their specific capacitance (i.e., figure of merit) at different intervals are presented in Table 1.This dataset analysis revealed that the specific capacitance of the supercapacitor had a moderate correlation with the SSA, a weak correlation with the nitrogen and oxygen content of the carbon electrode, and a very low or negligible correlation with the remaining parameters.These findings suggest that SSA, N%, and O% are important parameters for the enhancement of the capacitive performance of supercapacitors.In addition, the crosscorrelation analysis between all considered parameters depicted in Figure 3 shows that the physicochemical parameters of carbon electrodes are largely independent of one another, except for SSA, PV, and PS, which have a weak or a moderate geometrical intercorrelation.As a result, we can assert that the physicochemical parameters of the carbon electrode have mostly an independent effect on the supercapacitor's specific capacitance, thus they should be better considered separately from each other.

Comparison between regression methods
After completing data profiling and correlation analysis, we applied five different regression models to the dataset: ordinary least square regression, support vector regression, decision tree, random forest, and extreme gradient boosting.The dataset was divided into two parts: 70% of the data were randomly selected for training the regression models and the remaining 30% for testing.The total number of data entries used for training and testing was 3176 and 1362, respectively.The nine physicochemical characteristics of the carbon electrodes of supercapacitors in the dataset were considered as independent variables, while the resulting specific capacitance as the dependent one.The results are depicted in Figure 4 as a comparison between the real specific capacitance (  ) obtained from the literature articles in the dataset and the values of specific capacitance predicted by the regression models (  ).In each panel, the perfect match between the actual specific capacitance and the predicted one is shown via the straight diagonal line, where   =   .
As illustrated in Figure 4 (a), the matching between the actual and predicted values of specific capacitance using OLS method was low, as evidenced by the significant deviation of numerous data points from the diagonal line and the R 2 value of 0.32.Additionally, the large RMSE and MAPE values in Table 2 indicate that OLS regression achieves inferior prediction capability when compared with DT, SVR, RF, and XGBoost approaches.
The performance analysis of the tree-based model indicates that the DT model in Figure 4 (c) does not accurately predict the actual specific capacitance.Instead, the SVR, RF and XGBoost models were more accurate at predicting the specific capacitance.As illustrated in Figure 4 (b), (d) and (e), most data points lie near the diagonal line, indicating prediction accuracy as supported by the R 2 values of 0.72, 0.75 and 0.79 for the SVR, RF and XGBoost models, respectively.Moreover, other performance parameters (RMSE, b' and MAPE) also indicate that the SVR, RF and XGBoost models yielded superior regression capabilities when compared to the OLS and DT models.Since the performance analysis in Table 2 revealed that the XGBoost model showed the best R 2 and RMSE values, only this regression was employed in the following analyses on the dataset.Notice that XGBoost showed better prediction performance than an artificial neural network as well (see Supplementary Note 4 for details).

Influence of specific capacitance testing method
It is well established that the method of experimental testing can influence the magnitude of specific capacitance of supercapacitors.For instance, for the AC-based electrode developed by Meng  et al., 68 specific capacitance values of 225 F/g and 465 F/g were measured using two-electrode and three-electrode testing methods, respectively.Therefore, to investigate the effect of testing method on specific capacitance, the primary dataset generated in the current study was divided into two different subsets, wherein one contained the specific capacitance values obtained using the threeelectrode method of testing (2754 data entries) and the other comprised the specific capacitance values obtained using the two-electrode method (1784 data entries).The XGBoost model was trained again on each of these two subsets of data.
The actual specific capacitance and the predicted results by XGBoost model were found to strongly match for the three-electrode method of testing, as evident in Figure 5 (a).Such good prediction capability is also highlighted by the statistical performance parameters viz.R 2 = 0.89, RMSE = 28.71,b' = 0.98, and MAPE=28.71,being improved with respect to the regression analyses on the whole dataset reported in Section 3.2.Hence, the testing method has a significant effect on the specific capacitance value, being a further (categorical) variable to be considered in the prediction of specific capacitance.Thus, we investigated the significance of the different independent variable on the trained XGBoost model, which indicates how the physicochemical parameters of carbon electrodes influence the specific capacitance.Figure 5 (b) depicts the feature importance analysis for the threeelectrode testing method, where higher shares are associated to more influence of variables on specific capacitance: the SSA, heteroatom doping (N%), and PV were found to be the major factors influencing the specific capacitance.
The correlation between the actual and predicted specific capacitance for the datasets obtained using the two-electrode method of testing is shown in Figure 5 (c), instead.Again, the regression accuracy is improved by considering the subset of data measured with two-electrode method rather than the whole dataset.In fact, the performance parameters for the XGBoost regression of the twoelectrode dataset are R 2 = 0.93, RMSE = 19.45,b' = 0.989, and MAPE = 31.07,thus better with respect to the analyses carried out on the whole dataset.In this case, the PW, SSA, and the ID/IG ratio of the carbon electrode were found to contribute most towards enhancing the supercapacitors' specific capacitance, as observed from the feature analysis in Figure 5 (d).Interestingly, the SSA is found as an influential physicochemical characteristic of supercapacitors in both testing methods, while small discrepancies emerge for the other variables.PW is found as a relevant parameter of specific capacitance only in case of two-electrode measures, thus appearing as a possible descriptor able to discriminate between the adopted method of testing.

Influence of electrolyte
The specific capacitance of a supercapacitor is determined not only by the physicochemical behaviour of the electrode material and the testing method, but also by the type of electrolyte used.For instance, Zhou et al. 52 synthesised hierarchical nitrogen-doped porous carbon and demonstrated a specific capacitance of 339 F/g in 6M KOH and 282 F/g in 1 M H2SO4, respectively, at a current density of 0.5 A/g.Thus, to decouple the effect of different electrolytes from our analyses, we considered configurations with either 6M KOH or 1M H2SO4.Consequently, we extracted only data entries characterized by 6M KOH (2819 entries) and 1M H2SO4 (471 entries) from the overall dataset.In these cases, 80% of dataset was considered for training and 20% for testing the XGBoost model.Due to the limited data entries for the 1M H2SO4 electrolyte, then we considered only the aqueous electrolyte 6M KOHwhich has also the additional benefits of being inexpensive, safe, and with a high dielectric constant and specific capacitanceto discriminating again between two-electrode and three-electrode testing methods.Refining the datasets considering a specific electrolyte (6M KOH) and method of testing further improved the regression performance with respect to results in Sections 3.2 (overall dataset) and 3.3 (datasets separated for three-and two-electrode testing methods).This is evident from Figure 7  Notice that, due to limited data entries for different concentrations of electrolyte in the current database (1M KOH has 12 entries, 2M KOH has 169 entries and 3M KOH has 132 entries), we could not train a robust XGBoost model specifically dedicated to exploring also this effect on the specific capacitance of supercapacitors.

Influence of carbon electrode structure
Carbon exists in various allotropic forms with distinct morphologies and physicochemical properties.Activated carbon, hierarchical porous carbon, heteroatom doped porous carbon and graphene derived carbon are mainly employed for carbon electrodes of supercapacitors.The different morphological forms of these carbon allotropes may affect the specific capacitance of supercapacitors. 28ACs possess a large SSA and pore volume, thus easing the accumulation of static charges at the electrode surface and the resulting specific capacitance.HPC electrodes, instead, contain pores in a wide range of length scales (from micro to macro).The presence of macropores in HPC allows high-rate ion transport and acts as an ion reservoir.HA carbon electrodes are generally obtained from AC incorporated with heteroatoms (N, O, S, P), which enhance the wettability and electronic conductivity of the base material.Graphene shows also high SSA and electrical conductivity.
Therefore, we further split our dataset according to the type of carbon materials used in the construction of the electrodes.AC, HPC, and heteroatom (HA)-doped electrodes were differentiated to generate separate datasets, and XGBoost trained to best match the capacitive behaviour of supercapacitors made of specific carbon structures.Notice that, in this case, datasets have not been subdivided according to different testing methods or electrolytes, since the limited number of entries available for some classes of carbon structures did not allow a robust training of the regressor.Figures 8 (a), (c) and (e) compare the predicted and actual values of specific capacitance for the three types of considered carbon structures, highlighting a good match especially for HA ones (correlation statistics are detailed in Table 3).The feature analysis is done also in this case: Figures 8 (b), (d) and (f) identify as most influential physicochemical features a combination of parameters previously found for two-and three-electrode testing methods, such as PW, SSA, and N%.Overall, the current study focused mainly on the results obtained by applying XGBoost regressors, as their accuracy was shown to be superior to that of other ML models.As a result of differentiating datasets according to the testing method, electrolyte type or morphology of the carbon electrode material, the accuracy of trained XGBoost models was further improved (see R 2 values), while the most relevant physicochemical features identified for these different categorical variables.Considering all the feature analyses shown in Figures 4, 5, 6, 7, and 8, SSA is by far the dominant physicochemical characteristic of electrodes in determining the specific capacitance of the supercapacitors in the dataset, followed by N% and PW (which appears to be particularly influent when two-electrode methods are employed for the measure).PV, ID/IG and PS follow with decreasing importance, while I, O% and S% result to be the least influential features.

Conclusions
ML models such as OLS regression, SVR, DT, RF, and XGBoost were used to predict the influence of various physicochemical parameters and categorical variables of carbon-based electrode materials on the capacitive performance of an ELDC supercapacitor.First, a dataset was developed by extracting information from 147 experimental research articles on carbon-based electrode supercapacitors.This included the presence of defects, the pore volume, pore surface, current density, surface specific area, potential window, nitrogen, oxygen, and sulphur content in carbon-based electrode materials.Categorical variables such as the testing method, electrolyte type or morphology of the carbon electrode material were also considered.These data entries (4538) were fed into five regression models, prior to which the dataset was curated to achieve consistent physical units and outlier detection.Subsequently, the Spearman's rank correlation coefficient was used to determine the correlation between each pair of parameters, which suggested that all the available physicochemical parameters were not dependent from each other.For training the regression models, the datasets were divided into a 70:30 ratio for training and testing, respectively.Correlations between the actual specific capacitance and the predicted specific capacitance of the five models are ranked as follows: XGBoost>RF>SVR>DT>OLS, thus showing a superior regression performance by ML algorithms.
Additionally, we used the XGBoost model to predict the effect of the testing method (two-and three-electrode method) on the specific capacitance of supercapacitors.This resulted in acceptable performance parameters for both the testing methods.Furthermore, in the three-electrode method, SSA, N%, and PV were identified as the major contributors, whereas in the two-electrode method SSA, PW, and ID/IG were observed to significantly influence the capacitive performance.To comprehend the impact of the electrolyte on the specific capacitance, we further extracted datasets having 6M KOH in the two-electrode and three-electrode testing methods.The performance parameters obtained using the XGBoost method suggested improved statistics for both the testing methods.As a result, the PV, SSA, and N% were identified as the significant contributors in the threeelectrode method, whereas SSA, PS, and PW were confirmed to be the significant contributors in the two-electrode method.Finally, using the XGBoost model, we determined the various physicochemical characteristics according to the type of electrode materials used for the construction of the electrode that affect the specific capacitance of supercapacitor.The heteroatom (HA)-doped carbon exhibited a better regression in comparison to the AC and HPC.Overall, SSA appears as the most influential physicochemical characteristic of electrodes in determining the specific capacitance, followed by N% and PW.PV, ID/IG and PS have a decreasing importance, while I, O% and S% the least.
We highlight that the imperfect matching between the currently trained ML models and the considered experiments may be also due to different experimental conditions during the supercapacitor testing.For instance, electrode conditioning before property measurements usually involves extensive charge-discharge cycling or holding the electrode for some time at an elevated temperature and potential, and it typically decreases or even removes certain surface functional groups, which in turn improves storage stability.Unfortunately, not all researchers perform electrode conditioning before measuring properties or even identify possible current leakages when reporting measurements.This study, however, is not intended to replace modelling or experimental analyses, but rather to provide a preliminary support in the design and development of new supercapacitors, with the possibility to re-train and thus refine the presented ML models as soon as further data and descriptors will become available in the literature (e.g., extensive data on carbon material percentage in the electrode, cf.Supplementary Note 5).
In conclusion, the current study demonstrated the successful utilization of a data-driven method to predict the material performance for supercapacitor applications and revealed the most significant parameters that affect the specific capacitance of EDLCs.In perspective, the curated dataset developed and shared in this work may facilitate further analyses and potential optimization of carbon-based electrodes in different electrochemical applications.To ease the exploitation of the trained models (see Supplementary Note 6) by experimentalists, we developed a software with a graphical user interface (SUPERCAPs 69 ) that allows to easily provide with an estimate of the specific capacitance of carbon-based supercapacitors knowing the physicochemical characteristics and structure of carbon electrodes, the testing method, and electrolyte.

Porous and activated porous carbon
Because of their high specific surface area (SSA), improved electrical conductivity, adjustable pore-sizes, electrochemical stability, and low cost, activated porous carbons (AC) offer significant promise for use as the electrode [1][2][3][4].These properties make AC an excellent material for a variety of applications, including water purification, gas separation and storage, and electrode materials for capacitors, fuel cells and batteries.Porous/activated carbons are prepared by the pyrolysis of petroleum coke and coal followed by physical/chemical activation [5].In recent years, synthesising AC from fossil raw materials has been highly discouraged and several efforts were made to find sustainable and green sources of raw materials for the AC preparation.In this direction, biological matters such as roots, flowers, stems, leaves, fungi fruits, and animal body parts etc., have been adopted as resources for the synthesis of AC [6].Additionally, several methodologies have been used to synthesise the AC from the biomass precursor including pyrolysis, hydrothermal carbonization, mechano-chemical, hard-and softtemplating [7].
In pyrolysis, the biomass is heated at an elevated temperature (T = 300-1000 °C) under the presence of inert gas (e.g., nitrogen, argon).The conversion of biomass into AC involves several steps including removal of moisture (T<100 °C), degradation of cellulose and hemicellulose (in temperature range 200 °C<T<500 °C), and the decomposition of lignin (T>500 °C).The carbonization of biomass removes all the volatile materials, and a major part of the residual solid is carbon [8].Additionally, the application of pyrolysis-derived carbons (DPCs) in energy storage devices depends on their pore morphology and physicochemical characteristics.
Usually, DPCs have a low SSA and unsuitable pore morphology for their application in supercapacitors.Thus, an activation process (physical or chemical activation) is needed to enhance the physicochemical characteristics of DPC.In the case of physical activation, the carbon precursor is carbonized at a temperature (T<800 °C) followed by an activation process in the existence of activating agents such as air, CO2, and steam at an elevated temperature.The chemical activation process requires the mixing of carbon precursor with an activating agent at a suitable temperature [9].Because of its lower activation temperature, high yield, and generation of microporous carbon with a large SSA, potassium hydroxide (KOH) is used as one of the most common chemical activating agents.
In hydrothermal carbonization (HTC), the carbonization process of biomass occurs at natural conditions, i.e., in presence of an aqueous medium at mild temperature (130-250 °C) and pressure (0.1 MPa).This process is very complex and usually contains five steps: hydrolysis, dehydration, decarboxylation, polymerization, and aromatization [10].Additionally, in HTC processes, the nature and morphology of the AC can be tuned by varying the temperature under mild processing conditions [10].
In general, ACs possess a large SSA (> 1000 m 2 /g) and pore-volume (> 0.5 cm 3 /g), which are critical characteristics because the accumulation of static charges at the electrode surface determines the charge storage capacity of the supercapacitor.However, in practical observation, the capacitance of the ACs electrode supercapacitor is only about 10-20% of its theoretical capacitance [11].This is either due to the presence of inaccessible micropores or very large pores generated during carbon activation.Therefore, the high capacitive performance of the supercapacitor depends on the characteristics of electrode materials such as large SSA and optimal pore-size distributions to ease the transport of electrolyte ions within the pores [11].
The cost of AC hinders its application in a supercapacitor as an electrode material.Recent studies reported that the AC can be successfully synthesised using various biological sources and wastes such as pitaya peel [12], corncob residue [13], water bamboo [14], a harmful aquatic plant (Altemanthera philoxeroides) [15], pumpkin [16], pomelo peel [17].They are costeffective, environmentally friendly, in abundance, and renewable [18].One such low-cost biowaste is rice husk produced during the processing of rice.The annual production of rice husk is about 120 million tonnes all over the world [19] and its disposal is a serious environmental concern as the most common disposal method is the open-air burning of rice husk in an uncontrolled environment, which releases a large amount of carbon monoxide (CO) and carbon dioxide (CO2) [20].Cellulose, hemicellulose, silica, lignin, and moisture are the main constituent of the rice husk [21].
Guo et al. [22] used rice husk to synthesise AC with high SSA (ranging from 1392-2721 m 2 /g) by using alkaline hydroxide (KOH and NaOH) activation.They reported that the EDLC with rice husk AC electrode could achieve the specific capacitance of 210 F/g in the KCL solution.Furthermore, a high-performance supercapacitor using rice husk derived AC electrode developed by He et al. [23] obtained a capacitance of 245 F/g in a current density of 0.05 A/g, along with a slight decrement in the charge storage capacity (233 F/g) under increasing current densities (2 A/g).Wang et al. [24] also synthesised a AC electrode from hydrochar derived rice husk by KOH activation, which yielded a high specific capacitance of 312 F/g due to the high SSA (3362 m 2 /g) of the activated rice husk hydrochar electrode.Additionally, Chen et al. [25] synthesised a AC electrode for EDLC supercapacitor with a SSA of approximately 4000 m 2 /g, which obtained a specific capacitance of 368 F/g in a 6 M KOH electrolyte.
Similar to rice husk, corn stalk core is also a bio-waste that shows the possibility for the generation of AC electrodes for supercapacitors.The corn stalk core contains natural pores distributed in a sponge-like structure that are suitable for preparing the high surface area and AC electrode raw material.Yu et al. [26] prepared ACs electrode using corn stalk core with a high surface area (2349.89m 2 /g) and determined the specific capacitance of 140 F/g (at 1 A/g current density).Additionally, Wang et al. [27] reported the conversion of corncobs to activated carbon using chemical activation for application in a supercapacitor.The resultant ACs exhibited a high SSA of 3054 m 2 /g, and specific capacitances of 401.6 F/g and 328.4 F/g in 0.5 M H2SO4 and 6 M KOH electrolyte, respectively, at a current density 0.5 A/g.Karnan et al. [28] fabricated a supercapacitor device using an activated carbon electrode derived from corncobs and an ionic liquid electrolyte.With just a ten-second charge, the device could power a LED light for more than 4 minutes.The electrochemical performance of activated carbon (AC) electrodes in 0.5 M H2SO4 electrolyte also revealed the high capacity of the corncob electrode with a specific capacitance of 390 F/g at 0.5 A/g [28].Several authors also developed activated carbon (AC) electrodes from corn-based biomass such as corn straw, corn gluten, popcorn, etc. with improved SSA as well as reasonable electrochemical performance [29][30][31].
In a recent study, Rajabathar et al. [32] prepared a porous AC nanostructured electrode using jackfruit peel waste (JFPW), which showed outstanding electrochemical performance with a specific capacitance of 320 F/g at low current density (1 A/g) in 1 M Na2SO4 electrolyte.They also reported that AC electrodes derived from jackfruit retained a high specific capacitance 274 F/g even at a high current density (5 A/g).Similar evidence of a highperformance supercapacitor was also reported using AC electrode prepared from pitaya peel having a specific capacitance of 255 F/g at a current density 1 A/g and 96.4% retention capacity at 5 A/g with excellent stability in 6 M KOH electrolyte [12].Additionally, Lin et al. [17] synthesised hemicellulose AC raw material from pomelo peel for an electrode with a high SSA of 1361 m 2 /g and excellent charge storage capacity of 302.4 F/g at a current density of 0.5 A/g.Three-dimensional sakura-based activated carbons have also been utilized as the raw material for the electrode of supercapacitor.The electrochemical performance was analysed in a three-electrode method of testing with 6 M KOH electrolyte, reporting a specific capacitance of 265.8 F/g for sakura-based active carbon at a current density 0.2 A/g [33].Chang et al. also synthesised activated ACs using paulownia flower as the precursor, which offered 297.1 F/g specific capacitance at 1 A/g in 1 M H2SO4 electrolyte [34].Zhang et al. [35] reported the conversion of bamboo through carbonization and chemical activation into activated porous carbon for supercapacitor electrodes with a high specific capacitance of 293 F/g at 0.5 A/g current density.Wang et al. [36] modified commercially available coconut shell-based AC (CSAC) through H2O plasma resulting in an environmentally friendly method to generate electrode material (HCSAC) with enhanced specific capacitance and retention capability compared to its precursor CSAC.
Activated carbon-based supercapacitor electrodes have also been synthesised using a low cost, highly porous willow-wood.The resultant ACs exhibit a high surface area (2793 m 2 /g), pore-volume (1.45 cm 3 /g) and the presence of both micro-and meso-pores that are favourable for the energy storage [37].The obtained AC electrode supercapacitor also showed a magnificent electrochemical performance having a specific capacitance of 394 F/g at a current density of 1.0 A/g in an aqueous electrolyte (6 M KOH) [37].
One socio-economic and environmental concern worldwide is to get rid of seaweed (Ascophyllum nodosum) blooms, which is in abundance in northern oceans.Chemically activated biocarbon electrode derived from Ascophyllum nodosum can also be used in supercapacitors.Perez-Salcedo et al. [38] synthesised an activated biocarbon electrode derived from Ascophyllum nodosum for supercapacitors with a capacitance of 207.3 F/g at current density (0.5 A/g), excellent stability and retention capability.

Hierarchical porous carbon
The hierarchical porous carbon (HPC) material contains pores in a wide range of length scales that are missing in the conventional porous material.The HPC contains pores in the ranges of macro-(>50 nm), meso-(2-50 nm), and micro-(<2 nm) scales.The presence of macropores in HPC allows high-rate ion transport and acts as an ion reservoir.Furthermore, the interconnected mesopores provide low resistance pathways for the diffusion of ions, and the high SSA of micropores, which enhances the adsorption of ions at the pore surface [39].These unique properties of HPC material gained recent interest in the selection of electrode material for supercapacitors.The development of hierarchical porous structure from carbon material requires templating techniques.There are two types of templating: soft template; hard template.The soft templates were used as a substance for self-assembly; whereas the hard template method consists of the three steps: 1) impregnating the pre-synthesised template; 2) carbonization; 3) peeling off the hard template.It is followed by the carbon conversion process (carbonization) and etching [39].
However, these methods are complex, time demanding, and expensive.In addition, it is hard to regulate porosity that creates a serious obstruction in the usage and large-scale production of HPC.Thus, there is a special need for the development of an easy, inexpensive, and eco-friendly procedure for the synthesis of HPC materials.
The synthesis of AC material from natural biomass such as cotton is an eco-friendly and economical approach for the development of electrodes for supercapacitors.The bio swelling of cotton fibres under the influence of NaOH/urea enables the formation of HPC fibres with improved surface characteristics [40].The cotton fibre derived HPC fibres electrode material possess a high SSA (584.49m 2 /g), along with favourable pore morphologies that enhance the specific capacitance (221.7 F/g at 0.3 A/g) [40].
Bagasse (a biomass-waste from sugar industries) were used to synthesise carbonaceous material for the absorption of heavy metal ions, organic pollutants, and it finds its applicability in energy storage application.However, the carbonaceous material derived from bagasse contains narrow pore-size distribution and insufficient SSA that restricts its applicability in supercapacitors.Feng et al. [41] developed a simple method for the synthesis of HPC from bagasse using sewage sludge assisted hydrothermal carbonization with KOH activation.This process is a cheap and efficient way to regulate the porosity and structure of HPC and results in an excellent supercapacitive performance.The bagasse-derived hierarchical structured carbon (BDHSC) electrode supercapacitor possesses 320 F/g capacity at 0.5 A/g current density with good cyclic stability.Zhou et al. [42] synthesised the nitrogen-doped porous carbons (HNPCs) from biomass precursor cellulose carbamate with tuneable pore structures and ultrahigh SSAs via simultaneous carbonization and activation using a facile one-pot approach.They exhibited ultrahigh specific surface area (3700 m 2 /g), high pore-volume (3.60 cm 3 /g) and high-level nitrogen doping (7.7%).In three-electrode system, HNPCs showed specific capacitance of 339 F/g with 6 M KOH electrolyte and 282 F/g with 1 M H2SO4 electrolyte at a current density of 0.5 A/g, whereas in two-electrode system it exhibited a high specific capacitance of 289 F/g at 0.5 A/g.Gou et al. [43] prepared a HPC material for the electrode of the supercapacitor from wheat straw cellulosic foam with high SSA of 772 m 2 /g after KOH activation and micropores ranging from 1.05-1.74nm with 6 M KOH electrolyte.The high porosity provides the better migration of the ions in an electrolyte, thus enhancing the electrochemical characteristics of the capacitors.In three-electrode system, they obtained a capacitance of 226.2 F/g at a current density of 0.5 A/g.This provides a method for obtaining electrode materials from the cheap and easily available material wheat straws.
In a recent study, Zhao et al. [44] prepared N-O co-doped AC from low-cost, sodium alginate particles for the development of supercapacitors.They obtain the N-O co-doped AC from the crosslinking of SA beads with diammonium chains with the help of electrostatic interaction between ammonium cations and carboxylate groups of SA chains.Both the species (N-O) and concentration of diammonium chains strongly affected the electrochemical characteristics of NO co-doped AC.They obtained capacitance performance of 269.0 F/g at current density of 1 A/g.

Heteroatom doped porous carbon
Besides the SSA and pore-volume, there are also several other factors that influence performance such as surface functional groups and conductivity (pseudocapacitance and overall electric capacity).These can be achieved by the appropriating carbon with heteroatoms (nitrogen, oxygen, sulphur, etc.).By doing so, it not only enhances the wettability but also improves electronic conductivity of AC [45].Among these heteroatoms (N, O, S, P), sulphur is the highest reactive element.Sulphur doping of carbonaceous material increases its bandgap that enhances the electron donor properties and change in the electronic density of state.Sulphur doping also increases wettability, which in turn decreases the diffusion resistance that occurs between the electrode and electrolyte ions [46].
Li et al. [47] used willow catkin to develop porous carbon nanosheets (PCNs) from pyrolytic and activation approaches, followed by the co-doping of nitrogen and sulphur.The N-S co-doped carbon nanosheets electrode supercapacitor possesses 298 F/g capacity at 0.5 A/g current density and 298 F/g capacity at 0.5 A/g with green and low-cost materials for electrode of supercapacitor.Wang et al. [45] reported a porous nitrogen self-doped carbon material with layer structure for high-performance supercapacitors.It was derived from the byproduct of the pig-farming industry porcine bladders.It possesses C, N and O elements in abundance.Combining carbonization and KOH activation processes yielded the nitrogen selfdoped layered AC.KOH dosage can be changed to adjust the amount of N and pore structure.It has outstanding electrochemical characteristics including high specific capacitance of 322.5 F/g and good cycle stability during 5000 cycles.
Kim et al. [43] reported a straightforward method for biomass-derived AC with a high surface area and heteroatom doping.It involves exothermic pyrolysis of Mg/K/MgK-nitrateurea-cellulose mixture followed by a high temperature carbonization and washing treatment.The produced N-doped AC material shows specific capacitance of 279 F/g at 1 A/g in 6 M KOH electrolyte in the two-electrode method of testing.Wan et al. [48] prepared three AC from lotus pollen for supercapacitor activated with ZnCl2, FeCl3, and CuCl2.AC obtained by CuCl2 activation exhibits higher surface area, more porous and higher heteroatom doped than traditional activated ZnCl2 or FeCl3 AC.
Demir et al. [49] reported a method for the sustainable and economic transformation of waste product lignin (by-product of paper and pulp industry) into heteroatom doped AC used for electrodes of supercapacitor and CO2 capture applications.The synthesis process involves carbonization and chemical activation.The synthesised AC contains 2.5 to 5.6 wt.% nitrogen and 54 wt.% oxygen in its final structure.It possesses a high surface area 1788 to 2957 m 2 /g, capacitance of 372 F/g and excellent cyclic stability over 30,000 cycles in 1 KOH.Razmjooei et al. [50] developed AC from the most available human waste, urine.It started with the removal of mineral salt from urine carbon (URC), which makes it more porous followed by heteroatom doping (N, S and P).The combined effect of surface properties and porous structure makes it feasible for energy storage applications.It exhibits 1040.5 m 2 /g surface area, good conductivity and heteroatom doping of N, S and P exhibiting capacitance of 166 F/g at 0.5 A/g in a three-electrode method of testing.

Graphene derived carbon
In perspective, graphene is the most promising electrode material for various energy storage applications in particular supercapacitors due to a high electrical conductivity, high SSA, and excellent mechanical strength [51].Its porous structure also facilitates charge transport in the supercapacitor.These exceptional properties of graphene make it a suitable candidate for the supercapacitor electrode.
The SSA of graphene is highly tuneable according to the required supercapacitor electrode for energy storage applications.Also, the presence of highly movable free pi () electrons on its orbital are responsible for the exceptionally high electrical conductivity [51].Furthermore, the electrical behaviour of graphene can be improved through functionalization [52] and heteroatom doping [53].Torabi et al. [54] synthesised nanocomposite electrodes constituting porous graphene nanoribbons (PGNRs) and carbon black (CB).This PGNRs/CB electrode has a large specific area (1062.5 m 2 /g) and capacity of 223.0 F/g at 1.0 A/g current density.Supercapacitor electrodes were developed by intercalating copolymer Pluronic F127 between the layers of reduced graphene oxide (rGO) sheets.The intercalation of copolymer increases the surface area and pore-volume that results in the enhancement of surface wettability and improves its electrochemical performance [55].In another study, hydrazine reduced graphene hydrogel (GH-Hz) electrode possesses high electrical conductivity, with a specific capacity of 220 F/g at 1 A/g and a high retention capability (around 74%) at a high current density (100 A/g) [56].Xu et al. reported an efficient way to prepare holey graphene oxide through a scalable defect-etching strategy that creates numerous nanopores across the GO plane.Further reduction of holey graphene oxide using H2O results in three-dimensional hierarchical porous holey graphene hydrogel with significantly enhanced ion transport and surface area [57].Consider, the decision boundaries are at any distance say 'ε' from the hyperplane.So, these are the lines that we draw at a distance '+ε' and '-ε' from the hyperplane.Then the equations of decision boundary become:  +  = +, (S5)  +  = −.
(S7) The key goal here is to choose a decision boundary that has 'ε' distance from the initial hyperplane and contains data points closest to the hyperplane.

Decision tree (DT)
DT constructs the regression or classification models based on the data features in the tree's configuration.In a tree, every node is related to the property of a data feature.Moreover, it either predicts the target value (regression) or predict the target class (classification).The closer the nodes in a tree are, the greater their influence [63].Some benefits of the DT include: • It is easy and simple to understand, analyse, and intercept.
• It is capable of handling both categorical and numerical data.

Random forest (RF)
RF is an ensemble learning technique that can perform both regression and classification tasks utilizing the multiple decision trees.During training, the algorithm generates a large number of decision trees using a probabilistic scheme [64]; every tree is trained on a bootstrapped sample of the original training data and finds a randomly selected subset of the input variables to determine a split (for each node).Every tree in the RF makes its own individual prediction or casts a unit vote for the most popular class at input x.These predictions are then averaged in case of regression or the majority vote determines the output in case of classification [64].The core concept is to use numerous decision trees to determine the final output rather than depending on individual decision trees.

Extreme Gradient Boosting (XGBoost) model
XGBoost is one application of gradient boosting machines (GBMs) mainly designed for speed and performance.GBM is the most effective algorithms for supervised learning.In supervised learning, various features in the training data are utilized to predict the target values.XGBoost applies the classification and regression trees (CART) algorithms to a known dataset and categorises the data accordingly [65].For a dataset consisting of n number of samples and m number of features,  = {(  ,   )}(|| = ,   ϵR  ,   ϵR), the expression of an XGBoost algorithm, the total number of CART trees is as follows: where ℒ ̃() () is the second-order approximation of the loss function at the t-th iteration, and   and ℎ  are the first and second-order loss gradient on the i-th data, respectively.The instance set of a specific leaf node j is   .As a result, XGBoost can reduce loss iteratively and get better results than other ensemble algorithms.

Figure 3 :
Figure 3: Spearman's correlation between the physicochemical characteristics of the carbon electrodes of supercapacitors in the considered dataset.

Figure 4 :
Figure 4: Comparison between the actual specific capacitance from literature research articles in the dataset and the predicted specific capacitance from (a) OLS, (b) SVR, (c) DT, (d) RF, and (e) XGBoost models.

Figure 5 :
Figure 5: (a) Comparison between the predicted and actual capacitance values obtained by the threeelectrode method; (b) feature analysis for the three-electrode method of testing.(c) Comparison between the predicted and actual values for the two-electrode testing method; (d) feature analysis for the two-electrode method of testing.
The accuracy of the obtained regression is corroborated by the improved statistics of XGBoost model fitting for the 6M KOH electrolyte (R 2 = 0.81, RMSE = 33.86,b' = 1.01, and MAPE = 16.75) and the 1M H2SO4 electrolyte (R 2 = 0.87, RMSE = 36.48.44, b' =0.98, and MAPE =116.24), as depicted in Figure 6 (a) and 6 (c).The feature analyses carried out on supercapacitors with 6M KOH and 1M H2SO4 electrolytes demonstrate that the SSA, nitrogen doping, and PV were the major contributors to the capacitive performance in the 6M KOH electrolyte, whereas the SSA, nitrogen doping, and PW in the 1M H2SO4 electrolyte, as evident from Figure 6 (b) and 6 (d).

Figure 6 :
Figure 6: (a) Comparison between predicted and actual specific capacitance and (b) feature analysis for the subset of data having 6M KOH electrolyte.(c) Comparison between predicted and actual specific capacitance and (d) feature analysis for the subset of data having 1M H2SO4 electrolyte.
(a) and (b), where most data points are located near the diagonal line, indicating a strong correlation between the actual and predicted specific capacitance values.Furthermore, the accuracy of regression is corroborated by the improved statistics of XGBoost model fitting for both the two-(R 2 = 0.95, RMSE = 16.56,b' = 0.97, and MAPE = 10.15) and the three-electrode method (R 2 = 0.91, RMSE = 24.08,b' =0.985, and MAPE =12.88).Similarly to Figure 5, the feature analysis carried out on supercapacitors with 6M KOH electrolyte demonstrate that the SSA, PV, and nitrogen doping were the major contributors to the capacitive performance in the three-electrode testing method (Figure 7 (b)), whereas the PW, PS, and SSA in the two-electrode one (Figure 7 (d)).Hence, the regression performed on the limited dataset of supercapacitors with 6M KOH electrolyte does not change the relative influence of physicochemical characteristics of supercapacitors on their performance discussed in Section 3.3, therefore showing the robustness of the feature analysis.

Figure 7 :
Figure 7: (a) Comparison between predicted and actual specific capacitance and (b) feature analysis for the subset of data having 6M KOH electrolyte and measured by three-electrode method.(c) Comparison between predicted and actual specific capacitance and (d) feature analysis for the subset of data having 6M KOH electrolyte and measured by two-electrode method.

Figure 8 .
Figure 8.(a) Comparison between the predicted and actual specific capacitance and (b) feature analysis for the activated carbon electrodes.(c) Comparison between the predicted and actual specific capacitance and (d) feature analysis for the hierarchal porous carbon electrodes.(e) Comparison between the predicted and actual specific capacitance and (f) feature analysis for the heteroatom-doped carbon electrodes.
where  = {() =  () }(: R  → ,  ∈ R  ) is the CART trees space, and () corresponds to an input  to a leaf node of a CART tree.The symbols  and  represent the weight of the node and sum of the leaves in a tree, respectively.As a result, XGBoost calculates a final score by adding up all the weights from each CART tree.The learning goal is to determine the appropriate weights and splitting threshold for each tree node to reduce model complexity.The total loss function of XGBoost is defined as squared loss plus a regularization term:ℒ = ∑ (  ̂,   ) + ∑ Ω(  )RFs reduce this loss function by dividing features based on the most significant Gini information gain and by randomly assembling CART trees.XGBoost converts the loss function into a new scoring function that can be used to choose the best threshold[66]:

Table 1 :
Spearman's rank correlation coefficient between the specific capacitance and different features of carbon supercapacitors in the considered dataset.

Table 2
. Performance analysis of the different regression models on the collected dataset of carbonbased supercapacitors.

Table 3 .
Performance analysis of the trained XGBoost models to relate the physicochemical features of carbon electrodes with different structures and the measured specific capacitance.

Table S1 .
Information on the literature sources analysed to implement the dataset.