Comparison of computational chemistry methods for the discovery of quinone-based electroactive compounds for energy storage

High-throughput computational screening (HTCS) is a powerful approach for the rational and time-efficient design of electroactive compounds. The effectiveness of HTCS is dependent on accuracy and speed at which the performance descriptors can be estimated for possibly millions of candidate compounds. Here, a systematic evaluation of computational methods, including force field (FF), semi-empirical quantum mechanics (SEQM), density functional based tight binding (DFTB), and density functional theory (DFT), is performed on the basis of their accuracy in predicting the redox potentials of redox-active organic compounds. Geometry optimizations at low-level theories followed by single point energy (SPE) DFT calculations that include an implicit solvation model are found to offer equipollent accuracy as the high-level DFT methods, albeit at significantly lower computational costs. Effects of implicit solvation on molecular geometries and SPEs, and their overall effects on the prediction accuracy of redox potentials are analyzed in view of computational cost versus prediction accuracy, which outlines the best choice of methods corresponding to a desired level of accuracy. The modular computational approach is applicable for accelerating the virtual studies on functional quinones and the respective discovery of candidate compounds for energy storage.

(1) The 2D representation is converted to a three-dimensional (3D) geometry by applying the geometry optimization (OPT) scheme of OPLS3e FF 36 to identify the lowest energy 3D conformer. As shown in Fig. 1b, the FF level geometry is the starting point for all approaches. (2) The 3D geometry is further optimized in the gas-phase at three different levels of theory, namely: SEQM, DFTB, and DFT. For SEQM and DFT, geometry optimizations have also been performed in an implicit aqueous-phase, but for simplicity they are not shown in Fig. 1. This step yields different 3D geometries and the corresponding SPE of the compounds. (3) Next, SPEs of all the different 3D geometries are calculated using different DFT functionals. This step yields energy values that are directly comparable but are obtained from geometry optimizations that have been performed at four different levels of theory. (4) Lastly, for the geometries obtained from the optimizations in the gas-phase, the SPEs are recalculated, this time by including the effect of the aqueous medium (SOL) implicitly by using the Poisson-Boltzmann solvation model (PBF) 37,38 .

Results and discussions
Comparison of DFT methods. Since DFT is the highest level of theory considered in the current study, we begin with a discussion of the performance of the various DFT functionals, also with an aim to use them as performance benchmarks for low-level methods. Initially, we briefly discuss the performance of U rxn and G o rxn as chemical descriptors to predict redox potentials. For this purpose, DFT energy calculations using the PBE functional are performed, first for optimizing geometries in gas-phase and then for calculating single point energies in an implicit aqueous-phase. The calibration performances of U rxn (RMSE = 0.049 V, R 2 = 0.978) and G o rxn (RMSE = 0.048 V, R 2 = 0.979) are very similar, as shown in Supplementary Fig. S1. Inclusion of ZPE in U rxn , as well as entropic effects in G o rxn , is only marginally better than using E rxn (RMSE = 0.051 V, R 2 = 0.977). Therefore, we consider that the effects of including these terms are not significant enough from HTCS perspective. Accordingly, all the following discussions in this work consider E rxn as the descriptor.
First, we discuss linear calibrations of three representative DFT functionals of PBE (Fig. 2a), B3LYP (Fig. 2b), and M08-HX (Fig. 2c). The results shown in Fig. 2 have been obtained by using three kinds of DFT calculated reaction energies E rxn = E DFT rxn , against the experimentally measured redox potentials ( E o exp ) as follows: (1) OPT in gas-phase without calculation of SPE in SOL, (2) OPT in gas-phase and a following SPE in SOL, and (3) both OPT and SPE in SOL.
On comparing RMSE and R 2 data, the following observations are made: (1) When using E DFT rxn from only gas-phase optimized geometry and SPE, PBE is the least accurate functional at the GGA level with RMSE = 0.072 V, R 2 = 0.954. Nevertheless, the results show that at any DFT level, it is possible to predict E o exp for quinone-based molecules within a range of common experimental errors (~ 0.1 V).
Scientific Reports | (2020) 10:22149 | https://doi.org/10.1038/s41598-020-79153-w www.nature.com/scientificreports/ (2) Upon the inclusion of solvation effects using an implicit model on the gas-phase geometries, all the three functionals show a decrease in their RMSE values. The percentage decrease in error is highest for PBE (30%) and lowest for M08-HX (23%). (3) Remarkably for all the three considered functionals, full geometry optimizations and energy calculations in an implicit solvation model yield slightly worse results than their counterparts in which geometries are optimized in gas-phase. The RMSEs increase by 0.002-0.004 V, indicating that there is no real added value of performing geometry optimizations with implicit solvation, not to mention that they are also computationally more demanding.
Based on the findings above, we evaluate the performances of eight other DFT functionals without considering geometry optimizations in an implicit solution-phase. In Fig. 2d and Supplementary Fig. S2, a summary of the performance of all the DFT functionals considered in this work is presented using bar plots for RMSE and R 2 , respectively. When compared under the same set of conditions, all functionals, with the exception of LDA, show a similar performance. The PBE0/PBE0-D3, HSE06 and M08-HX functionals show a highly similar performance when using implicit solvation on gas-phase optimized geometries, which are followed by the other hybrid The hollow black arrows with symbol Δ t represent the difference between gas-phase reaction energies of the optimized geometries obtained at different levels of theory. The solid black arrows with symbol Δ e represent the difference between energies computed using DFT on a fixed geometry obtained from a lower-level theory and energies computed at that given level of theory. The dotted gray arrows represent the solvation effect from gas-phase SPE to solution-phase SPE, when the implicit solvation model is considered. For both (a,b), the text boxes with no background color represent geometry optimizations; the boxes with colored background represent SPE calculations; the boxes with water bubbles in the background represent solution-phase SPE calculations using the implicit aqueous solvent model. For all further comparisons in this work, we choose, among the compared exchange-correlation functionals, PBE as the benchmark DFT functional as it offers the best compromise between prediction accuracy and computational cost. We note that, the different DFT functionals have been compared here purely on the basis of their performance in predicting the measured potentials. We also note that, functionals constructed with higher degrees of empiricism, such as the Minnesota density functionals 39 , are aimed at producing better values for a chosen set of physically observable properties. In this regard, it is not surprising that the M08-HX performs best among the functionals, as it is heavily parametrized to show good performance for thermochemistry. However, it must also be kept in mind that such heavily parametrized functionals tend to produce less accurate electron densities than the ones with little to no empiricism in their design (e.g. the PBE functional) 40 . , and (c) M08-HX. The bar plot (d) shows the RMSE for all the functionals considered. The color green represents both OPT and SPE in gas-phase, the color orange represents OPT in gas-phase followed by SPE with SOL, and the color blue represents both OPT and SPE with SOL, as also summarized in a tabular format. In (d), the horizontal dashed green line represents PBE g (RMSE = 0.072 V, R 2 = 0.954) benchmark, the horizontal dashed blue line represents PBE aq (RMSE = 0.053 V, R 2 = 0.975) benchmark, and the horizontal dashed orange line represents PBE s (RMSE = 0.051 V, R 2 = 0.977) benchmark.

Discussion
As shown in Fig. 2d, the DFT LDA data is not as good as the GGA and the hybrid method calculated data. Additionally, there is a positive impact on the prediction accuracies due to the inclusion of implicit solvation in calculations of E DFT rxn . The effect can be attributed to a better accounting of the −OH groups' interactions with the surrounding aqueous environment in the hydroquinone products 32 . Surprisingly, optimizing geometries with implicit solvation slightly worsens the prediction accuracies. This can be attributed to multiple factors. First, it is possible that the PBF solvation model is not accurate enough to improve the gas-phase geometry. Secondly, there might be a serendipitous cancellation of errors when using the gas-phase geometry that is affected by the changes in the geometry due to the implicit solvation model in use. Additionally, E rxn is used as an approximation for G o rxn , and accounting for the ignored pressure-volume and entropy terms from Eq. (6) might result in better prediction accuracies when optimizing the geometries in solution. In the work of Kim et al. 32 it was shown that the reduction potentials of Anthraquinones in acidic aqueous solutions are strongly influenced by specific interactions with molecules in solvent environment. In aqueous solution, they found that using DFT (ωB97X-D/6-31G * ) with implicit solvation (PCM(Bondi)) for geometry optimizations yields good results, except for the redox couples that have strong intramolecular hydrogen bond interactions. They evaluated a total of 19 Anthraquinones and reported a mean absolute deviation (MAD) of 0.194 V for three outliers that showed strong intramolecular hydrogen bond interactions. This value was more than five times the MAD value of the remaining 16 redox couples (0.037 V). Further, they showed that QM/MM calculations (with the TIP3P force field for explicit water molecules) alleviate the overestimation and lead to a more balanced treatment of solute-solvent interactions. Accordingly, using a QM/MM model, the correlation between theory and experiment data had a MAD of 0.033 V. In the current work, we performed a similar analysis on our calibration data set of 43 molecule pairs. We found that there are only 12 molecules, with IDs: 1, 2, 3, 4, 5, 6, 8, 9, 16, 35, 37 and 39 shown in Supplementary Table S1, without any possibility of strong intramolecular hydrogen bond interactions due to the neighboring positions found in the hydroquinone versions of the molecules. Surprisingly, we found that when using implicit aqueous solvation during geometry optimizations, the MADs for the predicted redox potentials were 0.039 V for the 12 molecules and 0.037 V for the remaining 31 molecules. We note that these MADs are very similar and the difference between the two groups is only in the third decimal digit. Therefore, we cannot confirm that the explanation provided by Kim et al. also applies to the methods used in this work. At the same time, it must be noted that Kim et al. used only Anthraquinones (3-ring molecules) for their analysis, whereas this work considers a wide variety of quinone molecules (from 1 to 3 rings), including those with the C=O groups at the 1,2 positions on the compounds. Further, Kim et al. employed the PCM (Bondi) implicit solvation model, which is different from the PBF model used in the current work. These differences, as well as the difference in the calibration data, make it hard to ascertain the exact origin of the disparities between this work and the work of Kim et al.
Another important aspect of the calibration of molecules that needs to be considered is the effect of ionization of sulfonic acid groups, as they are prone to dissociation in aqueous media. In the calibration set of 43 molecule pairs used in the current work, there are 18 molecules that contain -SO 3 H groups, with IDs: 7,8,9,10,11,12,13,30,31,32,33,34,35,36,37,41,42 and 43 as shown in Supplementary Table S1. In the framework of the best performing scheme, i.e. PBE s , the calculated MAD for these 18 molecules is 0.047 V, which is approximately 50% higher than the MAD of the remaining 25 molecules (0.032 V). Clearly, the ionization of sulfonic groups has adverse effects on the prediction accuracies. Although the effect is not significant from a perspective of HTCS studies, we recommend the inclusion of explicit water molecules, such as in a QM/MM formalism, when extremely accurate values for the sulfonic group decorated quinone redox potentials are sought 32 .
In order to understand the existence of functional group dependent rational trends, we selected anthraquinone (Molecule ID 5 in Supplementary Table S1) as the base molecule from our calibration set and found six derivatives with -SO 3 H ( Supplementary Fig. S3a) and six derivatives with -OH functional groups (Supplementary Fig. S3b). Molecules with mixed functional groups were excluded. As shown in Supplementary Fig. S3a, an increase in the total number of -SO 3 H groups leads to an increase in the redox potentials; whereas an increase in the total number of -OH groups leads to a decrease in the redox potentials, as shown in Supplementary Fig. S3b. These trends are very much in line with the previous knowledge that the electron-withdrawing groups are known to increase the redox potential and vice-versa 26 . It must be noted that the correlation for -SO 3 H is weak, indicating that not only the type and the quantity of the functional group, but also the position of functionalization on the base molecules is decisive for the redox potentials 26 .
The calibration equation for the prediction of redox potentials versus SHE, using the PBE s (≡ T) calculated reaction energies is: The performance metrics of all the DFT functionals are given in Supplementary Table S2. Comparison of low-level methods: FF, SEQM and DFTB. After establishing the effectiveness of DFTbased methods, we next consider the computationally less costly methods for the optimization of geometries and the prediction of energies. As summarized in Fig. 3 and Supplementary Fig. S4, we employ different lower level methods such as FF, SEQM and DFTB for geometry optimizations. To obtain E T rxn of the reactions, we use the optimized structures of compounds and perform SPE calculations via the following three schemes: and M08-HX), on the molecular geometries obtained via scheme (I). III. SPE values are taken from DFT calculations with implicit solvation using three different functionals (PBE, B3LYP and M08-HX), on the molecular geometries obtained via scheme (I).
Several observations are made by comparing the R 2 and RMSE data across the various combinations of methods.

Comparisons within scheme (II).
When comparing predictions from SPE data of scheme (II) to PBE g benchmark (R 2 = 0.954, RMSE = 0.072 V), we make the following observations: • In Fig. 3b (solid bars), the performances of gas-phase DFT calculations of SPEs on gas-phase FF geometries (OPLS3e g : R 2 = 0.947, RMSE = 0.077 V) are significantly better than their counterparts from scheme (I). The same is also observed for gas-phase DFT calculations of SPEs on aqueous-phase FF geometries (OPLS3e aq : R 2 = 0.939, RMSE = 0.083 V). However, even after performing DFT calculations of SPEs, the OPLS3e aq performs worse than OPLS3e g. • In Fig. 3b (solid bars), gas-phase DFT calculations of SPEs on gas-phase SEQM geometries also show improved prediction accuracies with respect to their counterparts from scheme (I). The two best SEQM methods are AM1 g (R 2 = 0.963, RMSE = 0.064 V) and PM7 g (R 2 = 0.954, RMSE = 0.072 V), with performances equivalent to PBE g benchmark. Gas-phase DFT calculations of SPE on aqueous-phase SEQM geometries resulted in worse predictions for both AM1 aq (R 2 = 0.956, RMSE = 0.070 V) and PM7 aq (R 2 = 0.943, RMSE = 0.080 V), though they are still better with respect to their counterparts from scheme (I). • In Fig. 3b (solid bars), gas-phase DFT calculations of SPEs on gas-phase DFTB geometries also show slightly improved prediction accuracies and are slightly better than PBE g benchmark, with parameter sets DFTB-D3 g (R 2 = 0.960, RMSE = 0.067 V) and GFN1-XTB g (R 2 = 0.949, RMSE = 0.075 V).

Comparisons within scheme (III). Upon including implicit solvation effects during DFT calcula-
tions of SPE, the performances of low-level methods versus the corresponding PBE s benchmark (R 2 = 0.977, RMSE = 0.051 V) can be described as follows (please note that the subscript 's' represents gas-phase geometry optimization but with implicit solvation included while calculating SPE with DFT): • In Fig. 3b (dashed bars), RMSEs are lowered by 0.02 V for FF optimized geometries, both from gas-(OPLS3e g : R 2 = 0.969, RMSE = 0.059 V) and aqueous-phases (OPLS3e aq : R 2 = 0.964, RMSE = 0.063 V), in comparison to their counterparts from scheme (II). Surprisingly, the performances of these methods are close to PBE s benchmark. This shows that even though the thermochemistry with FF obtained energies is not accurate (as observed in Scheme I), the quinone molecule geometries from FF are good enough for performing DFT SPE calculations. • In Fig. 3b (dashed bars), prediction accuracies from SEQM optimized geometries are improved when compared to their counterparts in scheme (II). The two best SEQM methods are AM1 g (R 2 = 0.969, RMSE = 0.059 V) and PM7 g (R 2 = 0.976, RMSE = 0.051 V). Yet again, predictions from aqueous-phase SEQM geometries resulted in slightly worse results for both AM1 aq (R 2 = 0.961, RMSE = 0.066 V) and PM7 aq (R 2 = 0.962, RMSE = 0.065 V). Interestingly, the performances of these SEQM methods are close to PBE s benchmark. • In Fig. 3b (dashed bars), prediction accuracies from DFTB optimized geometries improve when compared to their counterparts in scheme (II). Strikingly, both sets of DFTB parameters, DFTB-D3 g (R 2 = 0.978, RMSE = 0.049 V) and GFN1-XTB g (R 2 = 0.977, RMSE = 0.051 V), perform better than PBE s benchmark.

Discussion
All the variations in computational methods that are used for geometry optimizations and SPE calculations, also with and without implicit solvation effects, are found to influence the prediction accuracies to varying degrees. First, for all methods, gas-phase DFT calculations of SPEs lead to significant improvements in prediction accuracies. According to these results, computationally demanding DFT geometry optimizations are hardly necessary for a first-order screening of large numbers of candidate molecules. Instead, either of SEQM or DFTB methods may be employed for the task of gas-phase geometry optimizations. Secondly, SPE calculations employing the PBE functional are generally better performing than SPEs obtained from computationally more costly B3LYP and M08-HX functionals. Thirdly, for all low-level methods, the inclusion of implicit solvation during DFT calculations of SPEs leads to improved prediction accuracies. Finally, the results confirm that the effects of geometry optimizations in aqueous-phase are minimal and they often result in slightly worse prediction accuracies. The calibration equation for the prediction of redox potentials versus SHE, using the DFTB-D3 g (≡ T) calculated reaction energies is: The performance metrics of all the low-level methods, as well as their combinations with high-level methods, are given in Supplementary Tables S3-S9. It is worth pointing out that a recent study by Tabor et al. 27 also considered DFT and SEQM based methods for performing a similar calibration exercise to determine an optimum method for predicting redox potentials of 28 quinone molecules using E rxn as a descriptor. In their study, at the DFT level, they considered the B3LYP functional with 6-311+G(d,p) basis set, which is very similar to the basis set used in this study. At the SEQM level, they considered the PM7 method. The MAD with gas-phase DFT was 0.056 V (vs. 0.058 V using PBE in this work) and with implicit aqueous solvation using the PCM model was 0.041 V (vs. 0.038 V using PBE with PBF solvation model in this work). When using the PM7 method, their predicted MAD value was 0.052 V in the gas phase (vs. 0.065 V in this work), and 0.067 V with the COSMO solvation model (vs. 0.067 V in this work). The difference between the results from the two studies are quite small www.nature.com/scientificreports/ and vary mostly in the third decimal place, which is not surprising given the common pool of molecules. For predicting redox potentials, we recommend using Eq. 1 not only because it has slightly better prediction accuracy but also because it covers a larger group of experimented molecules with a wider range of redox potentials.

Accuracy of predictions versus cost of calculations. In addition to determining accurate methods for
predicting the redox potentials of electroactive quinones, a major aim of the current study is to decide on the methods that are most suited for both standalone and HTCS studies for which a balance between the speed of computations and the accuracy of results is desireable. This is especially important when the robust DFT calculations become impractical for studies on a vast chemical space (10 3 ~ 10 6 compounds). Thus, an estimation of trade-offs between the computational accuracy and its cost is useful for efficient screening studies. For a comparison of the computation time, we selected a representative method for each level of theory, namely, OPLS3e (FF), PM7 (SEQM), DFTB-D3 (DFTB) and PBE (DFT). Next, noting that the geometry optimizations are usually the most computationally demanding processes, we optimized the geometries of all the molecules in gas-phase using these representative methods. We added the FF geometry optimization time to all other methods' calculation times, since we use it as the base method for performing all other geometry optimizations (as explained in Computational Workflow). An averaged computation time, as obtained from five different runs that actually showed no significant variation, is used to describe the relation between RMSEs versus the computational cost. As shown in Fig. 4a, DFTB-D3 is almost as accurate as DFT in predicting the redox potentials, while it requires substantially less (~ 10 3 times) computing time. Also noting that DFTB-D3 has previously been applied for calculations of large systems at relatively lower computational costs and with similar accuracies to that of the higher level (i.e. DFT-GGA) methods 41 , we suggest that for the small quinone redox compounds the DFTB-D3 method provides a good compromise between prediction accuracy and computational cost. Accordingly, for HTCS studies that are aimed to work on extremely large chemical spaces of molecules, we suggest DFTB-D3 computations on OPLS3e optimized geometries, which have an RMSE of 0.072 V that is close to the RMSE of 0.051 V of the DFT (PBE) calculations, as a way to accelerate the virtual screening of compounds.

Effects of geometry optimizations at various levels of theory and implicit solvation.
For the set of methods considered in the previous section, we also quantify the effect of gas-phase DFT calculation of SPEs by using the geometries that have been obtained via low-level methods. As shown in Fig. 4b (solid bars), the improvement in prediction accuracy, Δ e RMSE, is most significant for geometries optimized by OPLS3e (Δ e RMSE = 0.136 V), followed by PM7 (Δ e RMSE = 0.031 V), and then by DFTB-D3 (Δ e RMSE = 0.005 V). These results show that SEQM, and more pronouncedly, DFTB methods do not only predict the reaction energies accurately, but they also predict the geometries of the compounds as comparable to that of DFT. For the same set of methods, we also investigate relationships between the differences in molecular geometries and the calculated values of Δ e RMSE. By performing structure superposition analysis we compared the optimized geometries from different methods to the reference, gas-phase PBE optimized geometries. The average root-mean-square deviation (RMSD) of all the 86 reactant and product molecules under various atomic constraints are shown in Table 1.
First, under all constraints OPLS3e has the largest average RMSD with respect to PBE, which is expected. Secondly, when considering all atoms or only heavy atoms, PM7 and DFTB-D3 are very similar in geometrical difference with regards to PBE. Thirdly, it is surprising to note that DFTB-D3, while the most accurate of the low-level methods, does not necessarily provide the geometry closest to PBE (i.e. lowest average RMSD) when www.nature.com/scientificreports/ all atoms or non-hydrogen atoms are considered. However, when considering only the carbon atoms in the ring structure of molecules, DFTB-D3 produces structures that are closest to PBE. Given that the cyclic carbon atoms are a large fraction of the total number of atoms, it is possible that being able to represent the geometry of the rings accurately is what gives DFTB-D3 (Δ e RMSE = 0.005 V) an advantage over PM7 (Δ e RMSE = 0.031 V) and OPLS3e (Δ e RMSE = 0.136 V) for the redox potential predictions. Next, we quantify the improvements in prediction capability of the methods, with respect to gas-phase DFT calculation of SPE, due to the inclusion of implicit solvation effects, Δ s RMSE (dashed bars). As shown in Fig. 4b, an improvement in prediction accuracy is evident for all levels of theory and Δ s RMSE are similar at each level of theory. The decrease in RMSEs are 0.018, 0.021, 0.018, and 0.021 V for OPLS3e, PM7, DFTB-D3, and PBE, respectively. These results show that the amount of improvement due to the inclusion of implicit solvation is independent of the source of geometry. These findings are useful, for instance, when building machine learning (ML) models for the prediction of solvation energies directly from simple cheminformatics-based descriptors without a real need for an explicit knowledge of compound geometries.
In addition to the above findings, the results presented here are expected to be useful for generating accurate and large quantity chemical data on compounds, which can later be utilized by data-driven machine learning models for expanding the boundaries of search space during candidate compound explorations. Firstly, as has been argued in recent studies 27,29 , using the computationally costly quantum chemical calculations for millions of molecules, which is required for building powerful ML models, is still a major bottleneck. One of the key findings of this work is that the DFTB method is nearly as accurate as DFT when it comes to the prediction of quinone redox potentials. Thus, the data scarcity bottleneck can be addressed directly by using DFTB to generate large quantities of reliable reference data. Secondly, it has been demonstrated quantitatively in this work that the improvement due to the use of quantum chemistry methods, when compared to SEQM and DFTB methods, corresponds to an energy contribution that constitutes only a minor fraction of the total energy. An increasingly popular strategy that utilizes this fact for the property predictions at quantum chemical accuracy is Δ-ML 42 . In this strategy, by training on quantum chemical reference data, a ML model is trained to predict the correction terms, Δ, to values that have been computed using the inexpensive methods. This way, instead of performing quantum chemical calculations directly on all candidate compounds, the energy values are interpolated from a selected inexpensive baseline theory to quantum chemical accuracy. In our study we recommend a choice of methods, for both reference (e.g. M08-HX) and base line (e.g. DFTB-D3) data, to perform such Δ-ML tasks. Finally, we would like to emphasize again, that this work provides a general methodology for determining the optimum combination of methods with an example case of the quinone family of compounds. This knowledge is directly relevant for developing advanced decision-making frameworks that can automatically decide on the most optimum combination of computational methods for a target class of chemical compounds.

Methods
Thermodynamic principle. The reaction energy difference between the reactant and product is used as an approximation for the Gibbs free energy of proton-coupled electron transfer redox reactions. During a redox reaction in the aqueous-phase: The hydroquinone, QH 2 , compounds can be generated from the quinone, Q, compounds via a two-electron two-proton redox reaction 22 . A quantitative measure of the favorability of a given reaction is the change in standard Gibbs free energy, G o rxn . According to the Nernst equation, the equilibrium potential of a redox reaction, E o , is related to the change in the standard Gibbs free energy per coulomb of charge transferred during the electrochemical reaction as: where n = 2 is the number of electrons and F is the Faraday constant. Typically, E o is measured relative to the Standard Hydrogen Electrode (SHE) and G o rxn is computed at standard conditions. To calculate G o rxn we use the following equation: in which G o rxn is expressed simply as the difference in the standard free energies of the reactants and products. Thermodynamically, G o rxn can be described as a sum of contributions arising from the change in internal energy ( U rxn ), pressure-volume ( p V rxn ) and entropic ( T S rxn ) contributions due to reaction as: The change in internal energy can be further decomposed as U rxn = E rxn + ZPE , where E rxn is the reaction energy and ZPE is the change in zero-point energy. In the present work, the ZPE contributions to the internal energy, changes in pressure-volume, and entropic contributions are neglected, thus effectively using the approximation: where E aq represents the theoretically calculated internal energy of species in aqueous-phase. All the other contributions are ignored because they require extra calculation steps, computational resources, and thus, are not suitable from the perspective of HTCS. Nevertheless, the effects of ignoring these other contributions on the accuracy of predictions are discussed in the main text. In this work, the descriptor of choice, E rxn , is calculated with the inclusion of aqueous solvation effects, which requires additional computation time. To quantify the effect of solvation on prediction accuracy, another approximation is considered by ignoring solvation such that Eq. (7) can be rewritten using internal energies calculated in the gas-phase as: Under these set of approximations, the calculated change in internal energies E rxn from Eqs. (7) and (8) are linked to the measured redox potentials using Eq. (4). We used various theoretical methods, as explained below in computational details, to calculate E rxn , and discuss their performance for the prediction of experimentally measured redox potentials.

Computational details.
We developed a systematic computational approach involving one FF, nine SEQM, two DFTB, and eleven DFT methods, as well as their combination with implicit solvation environments, to predict the experimentally measured redox potentials of quinone-based electroactive. The MacroModel program is used for FF configurational searches and geometry optimizations, while the Jaguar program 38 is used for DFT calculations, all as implemented in the Schrödinger Materials Science Suite (version 2019-2). MOPAC and DFTB calculations are performed using the ADF program 43 . Molecular structures are optimized both in gas-and aqueous-phase using the OPLS3e FF, which provides a broad coverage of small compounds 36 . The gas-phase FF optimized geometries are used as inputs to perform gas-and aqueous-phase geometry optimizations using nine different SEQM methods, including AM1 44 51 . The aqueous-phase geometry optimizations at the SEQM level are performed using the COSMO-RS solvation model 52,53 . The choice of this solvation method is constrained by the present availability in the ADF program. The gas-phase FF optimized geometries are also used as inputs for DFTB optimizations using the DFTB-D3 54 and GFN1-xTB 55,56 methods. The DFTB-D3 computations are performed with a self-consistent charge cycle using the QuasiNANO-2015 parameter set 33 , while the parameters for GFN1-xTB are taken from the work of Grimme et al. 55,56 The aqueous-phase geometry optimizations of molecules are not performed with the DFTB method, since currently there is no available routine for this task in the ADF program. Finally, FF minimized geometries are used as inputs to perform geometry optimizations in gas-phase DFT calculations using different flavors of exchange-correlation functionals, including the local density approximation (LDA) 57 , generalized gradient approximation (GGA) 58 , hybrid and meta-GGA functionals 58 , all of which vary drastically in their accounting of the exchange-correlation energy. For the geometries that have been obtained from FF, SEQM and DFTB optimizations, the DFT level SPEs are computed in gas-phase, and subsequently in aqueous-phase using only the PBE, B3LYP, and M08-HX functionals, as they are well accepted in the community but also span a wide range of ways for accounting for the exchange-correlation effects 58 . A total of 11 exchangecorrelation functionals, also including some of the D3 dispersion 59 65 , and M08-HX 39 . Due to computational costs, the DFT aqueous-phase geometry optimizations are performed only with the following functionals: PBE, B3LYP, and M08-HX.
As DFT options in Jaguar, we chose "medium" grid density for OPT and "fine" grid density for SPE calculations. Energy and RMS density matrix change convergence criteria are set to the default values of 5.0 × 10 -5 and 5.0 × 10 -6 Hartree, respectively. The default direct inversion in the iterative subspace is employed as the convergence scheme. For OPT, Jaguar's mixed pseudospectral grids with default cutoffs are employed. For SPE calculations, we used pseudospectral grids with accurate cutoffs. To treat solvated molecules in water, we used the standard PBF solver with water as the solvent 37,38 . The calculations are performed with LACVP **++ basis set with polarization and diffuse functions 66,67 . The LACVP basis set is chosen here because it includes an effective core potential (ECP), which represents the effect of the core electrons in a parametrized form. The use of ECPs speeds up calculations on compounds that contain heavy elements. For the elements from H to Ar, LACVP and the widely employed 6-31G are essentially indistinguishable when evaluating the ground-state properties. The quinone molecules considered in this work contain the elements of C, H, O, N, S, F and Cl, and thus, the use of LACVP **++ basis set in this work is consistent with the use of 6-31G **++ basis set.
Calibration data and performance metrics. We collected redox potential data from 43 quinone redox couples in acidic aqueous solution 16,19,27,68 . In consideration of prediction accuracy and universality for the calibration models, the selection of available experimental data has been expanded within various quinone molecules, rather than using monotonous structural patterns. This is because compounds decorated with chemical functional groups usually show different redox properties as well as charge/discharge capacities when compared (6) G o rxn = U rxn + p V rxn − T S rxn Scientific Reports | (2020) 10:22149 | https://doi.org/10.1038/s41598-020-79153-w www.nature.com/scientificreports/ to their undecorated counterparts. The experimental molecules cover both quinone cores and their functionalized derivatives with multiple substituted groups including -SO 3 H, -COOH, -OH, -CH 3 , -F and -Cl (see Supplementary Table S1) and the redox data spans a broad range experimental redox potentials between −0.084 and 1.21 V. The redox couples are chosen consistently from measurements that were performed under similar experimental conditions, such as T = 298.15 K, pH = 0, and highly conducting salts. The correlations between experiments and calculations are expressed in terms of two commonly used coefficients, namely the coefficient of determination (R 2 ) and root-mean-square error (RMSE). R 2 and RMSE are calculated using the definitions from the Originlab. All benchmark simulations of calculation time have been performed on a single core of Intel Core i9-9960X 3.10 GHz CPU with Ubuntu 18.04 Bionic Beaver as the operating system.

Data availability
The generated computational data of compounds is provided in Supplementary Table S1.