Supervised learning of a chemistry functional with damped dispersion

Kohn–Sham density functional theory is widely used in chemistry, but no functional can accurately predict the whole range of chemical properties, although recent progress by some doubly hybrid functionals comes close. Here, we optimized a singly hybrid functional called CF22D with higher across-the-board accuracy for chemistry than most of the existing non-doubly hybrid functionals by using a flexible functional form that combines a global hybrid meta-nonseparable gradient approximation that depends on density and occupied orbitals with a damped dispersion term that depends on geometry. We optimized this energy functional by using a large database and performance-triggered iterative supervised training. We combined several databases to create a very large, combined database whose use demonstrated the good performance of CF22D on barrier heights, isomerization energies, thermochemistry, noncovalent interactions, radical and nonradical chemistry, small and large systems, simple and complex systems and transition-metal chemistry.

The rapid advances of computer capability and the progress of theoretical methods have significantly increased the accuracy of theoretical predictions of chemical, physical, biological, material and atmospheric processes. Relative energies, obtained by electronic structure calculations, are the dominant property controlling molecular and material stability and rate processes, and they play a central role in chemical modelling. Kohn-Sham density functional theory 1,2 (KS-DFT) has played a major role as the most popular electronic structure framework for modelling the relative energies of large molecules and materials. In principle, KS-DFT is exact, given an exact density functional. However, in practice, density functional approximations (DFAs) are necessary. By adding physical ingredients, enforcing relevant known constraints and optimizing against broader databases [3][4][5][6][7] , DFAs can be made more broadly accurate 8,9 , but existing functionals still leave much room for improvement 10 . Many functionals are accurate only for subsets of chemical properties, and only a few functionals (for example, the doubly hybrid functionals DSD-BLYP-D3(BJ) 11,12 , DSD-PBEP86-D3(BJ) 13 and B2GPPLYP-D3(BJ) 11,14 ) can be applied to make equally accurate predictions on diverse types of chemical systems, such as main-group molecules and transition-metal compounds, large and small systems, bonding and noncovalent (NC) interactions, stable molecules and transition states or radicals and closed-shell systems [3][4][5][6][7] .
An alternative approach to obtain relative energies is molecular mechanics (sometimes called force fields). In this approach, the relative potential energies are represented as functions of molecular coordinates and (optionally) partial atomic charges. This method has been used for more than 70 years, and additional examples are given in Supplementary Section 1.2.
A promising approach, mostly very recent, is to use Big Data and machine learning to improve energy functionals, of either the molecular mechanics or the density functional type. Another powerful development, also old but having advanced in recent years, is the addition Article https://doi.org/10.1038/s43588-022-00371-5 parameters are coefficients in a multi-term energy functional that minimizes a loss function. The loss function used here has two components: one measuring errors on a large database of molecular properties, which are mainly relative energies, and a second, regularization term that promotes the smoothness of the resulting energy functional. Supervised learning is used as a key part of the optimization process. The final energy functional obtained from this work is called Chemistry Functional 2022 with damped Dispersion (CF22D). Our workflow is summarized schematically in Fig. 1a.

Results
The functional form of CF22D and a discussion of how we optimized the functional are presented in Methods section with additional details in Supplementary Section 1. The parameters of the CF22D functional are given in Supplementary Table 1. To assess the performance of the CF22D functional, we compare the results of CF22D against those obtained with other representative functionals on several well-known databases, namely GMTKN55 (ref. 4), Minnesota DataBase 2019 (MDB2019) 3 , MGCDB84 (ref. 5) and the transition-metal data sets of CUAGAU42 (ref. 6) and TMC34 (ref. 7). The consolidated database DDB22 proposed in this work is also used for the assessment. All component data sets of DDB22 are shown in Fig. 1b with detailed explanations given in Supplementary Data 1.
of molecular mechanics terms to density functionals to form what one may call a combined quantum mechanical-molecular mechanical energy functional or, for brevity, an energy functional. This broadens the search for DFAs to a search for such more broadly defined energy functionals that can take advantage of both density functionals and molecular mechanics. The present article uses supervised learning to optimize such an energy functional. Supplementary Section 1.2 gives additional references regarding the use of Big Data and machine learning to improve energy functionals and the addition of molecular mechanics terms to density functionals.
In practice, most modern functionals contain parameters that are adjusted in whole or in part to obtain better agreement with experimental data (or, in limited amounts, high-level theoretical data), and the broad advances in the use of machine learning and Big Data now enable ways to train density functionals with larger and more complex data sets. There are functionals with a variety of different combinations of ingredients, and including different ingredients is one way to improve the accuracy. The work presented here differs from previous efforts in that we start with a functional form (the MN15 functional 15 ) for a density functional that has already proved successful when optimized with smaller databases, combine it with a molecular mechanics term to account for long-range dispersion interactions and use supervised learning and a large database organized into multiple data sets to simultaneously learn optimum parameters for both components. The form of the MN15 functional was selected for its outstanding performance in early tests and its flexible functional form of nonseparable exchange-correlation energy.
The input to a machine-learning algorithm is a set of physical descriptors, and the output is the set of parameters determining the energy as a function of the descriptors. In the approach used here, each term in the MN15 functional is regarded as a physical descriptor, and we also use the molecular geometry as a descriptor. Consequently, the input is a set of integrals of various functionals of the electron density for a set of molecules and the geometries of these molecules, and the  Fig. 1 | Method and database for the development of CF22D. a, The workflow of the development of CF22D. The criterion in the validation step is that, if the MUE of the trial functional for one data set in the validation set is 30% higher than the average MUE of the top five functionals for this data set on the basis of results from ref. 5 , then this data set is moved from the validation set to the training set based on supervised learning. A new training database is thereby obtained, and the optimization procedure then goes back to the training step. If the MUE of the current training database is converged, and there is no new validation set to be moved into the training set, the procedure ends. b, The DDB22 database. Bold text indicates data sets belonging to the training set (see Supplementary Data  1 and Supplementary Table 2 for more details). The orange data sets contain barrier heights (BH), the green data sets contain isomerization energies (IE), the purple data sets contain noncovalent interactions (NC), the golden data sets contain thermochemical properties (TC), the pink data sets contain excitation energies (EE), the dark-blue data sets contain molecular structural data (MS), the brown data set contains dipole moments (DM) and the others are coloured black (transition metals, TM).     Article https://doi.org/10.1038/s43588-022-00371-5 The functionals against which we compare are listed with references in Supplementary Table 3, where they are separated into groups on the basis of their ingredients. We especially note the category of doubly hybrid functionals 16,17 , which include correlation contributions based on unoccupied orbitals. This can add accuracy but also increases the cost. The functionals considered for each sub-database are specified in Tables 1 and 2. Since the doubly hybrid functionals are more expensive than the other functionals and the recent deep learning functional DM21 is quite different from the other functionals, we first compare only the 29 other functionals in Supplementary Table 4. For brevity, we call these ordinary functionals.

Performance on the GMTKN55 database
The GMTKN55 database, consolidated by Grimme and coworkers 4 , covers thermochemistry (TC), kinetics and NC interactions of maingroup elements. Morgante and Peverati 18 pointed out that GMTKN55 has more accurate reference values than MGCDB84, because the latter was mainly built based on GMTKN30, which is a predecessor version of GMTKN55. Therefore, the GMTKN55 database was selected to benchmark the performance of CF22D for general chemical properties of main-group elements.
The 1,505 data of GMTKN55 can be partitioned into five sub-databases, namely basic properties and reaction energies for small systems (the 'small' sub-database, comprising 18 data sets with 473 data), reaction energies for large systems and isomerization reactions ('large', comprising 9 data sets with 243 data), reaction BHs ('BH', comprising 7 data sets with 194 data), intermolecular NC interactions ('inter-NC', comprising 12 data sets with 304 data) and intramolecular NC interactions ('intra-NC', comprising 9 data sets with 291 data). Another classification is to divide the 55 data sets into two sub-databases: Radical7 and Nonradical48 (refs. 4,19 ). The former includes the G21EA, G21IP, SIE4x4, ALKBDE10, HEAVYSB11, RC21 and RSE43 data sets, while the latter includes the rest of the data sets in GMTKN55.
Goerigk et al. introduced the weighted total mean absolute deviation (WTMAD) measures WTMAD-1 and WTMAD-2 (refs. 4,20 ) for comparison of the performance of density functionals on GMTKN55. Explanations of WTMAD-1 and WTMAD-2 are given in Supplementary Section 2.2. Supplementary Table 6 and Supplementary Data 2 give the resulting WTMAD results. Supplementary Table 4 provides the mean unsigned error (MUE), and Supplementary Table 7 provides the mean of the mean absolute error (MoM). CF22D gives the lowest MUE and the second lowest WTMAD-1 and WTMAD-2 among the 29 ordinary functionals, whereas ωB97M-V gives the second lowest MUE and the lowest WTMAD-1 and WTMAD-2. We conclude that the CF22D and ωB97M-V functionals perform similarly well on the GMTKN55 benchmark data for main-group chemistry.
For the Large category, CF22D is the best-performing functional and in particular outperforms all five doubly hybrid functionals and the DM21 functional. In this category, it is especially interesting to discuss the MB16-43 data set in GMTKN55 (ref. 4 ). MB16-43 was proposed in the spirit of 'mindless benchmarking' 21  The results when using the doubly hybrid functionals and the deep learning functional for Radical7 and Nonradical48 are compared with ordinary functionals in another way in Fig. 2b. We see that doubly hybrid functionals are mostly located in the lower left corner of the graph. B2GPPLYP-D3(BJ) and DSD-BLYP-D3(BJ) are the best-performing doubly hybrid functionals for the Radical7 and Nonradical48 sub-databases, respectively. We also see that CF22D is the only functional without doubly hybrid character that lies in the lower left corner, again demonstrating its excellent and balanced performance for both radical and nonradical systems. In fact, the performance of CF22D is comparable to some of the doubly hybrid functionals. For instance, CF22D gives lower MUEs for both Radical7 and Nonradical48 as compared with PWPB95-D3(BJ). For Nonradical48, CF22D also performs better than the other two examined doubly hybrid functionals (B2PLYP-D3(BJ) and MPW2PLYP-D3(BJ)). In addition, as compared with the state-of-the-art deep learning functional DM21, CF22D gives better performance for both Radical7 and Nonradical48. We conclude that the performance of CF22D is competitive with DM21 and CF22D shows high accuracy across diverse types of chemical properties.

Performance on the AME418 sub-database
The AME418 sub-database of MDB2019 is in the training set for optimization of the CF22D functional. As shown in Fig. 3, CF22D gives the A portion (350 data points) of AME418 was divided into singlereference systems (SR297) and multi-reference systems (MR53) 15,22 . For SR297, CF22D performs the best (MUE of 1.57 kcal mol −1 ). For MR53, the performance of CF22D ranks eighth with an MUE (5.37 kcal mol −1 ) that is much better than the average (7.97 kcal mol −1 ) but not as good as the best (3.96 kcal mol −1 for MN15-L).
The full AME418 database is next subdivided into eight sub-databases: main-group bond energies (MGBE136), transition-metal bond energies (TMBE30), BHs (BH76/18), NC interactions (NC51/18), excitation energies (EE18), isomerization energies (IEs) (IsoE14), hydrocarbon TC (HCTC20) and miscellaneous (Misc73). As shown in Supplementary Data 4, CF22D gives the second best results for the MGBE136, EE18 and IsoE14 sub-databases, the third best results for the NC51/18 and HCTC20 sub-databases and the fourth best result for the Misc73 subdatabase. On the remaining two sub-databases (TMBE30 and BH76/18) it does not rank in the top nine, but its MUEs of 6.61 and 1.53 kcal mol −1 are still considerably better than the average for these sub-databases of 8.79 and 3.31 kcal mol −1 , respectively.

Performance on the MGCD84 database
The MGCDB84 database 23 has 4,986 data. The data for NC interactions and thermochemical properties account for 41.7% and 24.2% of the database, respectively. Portions of MGCDB84 were used in training ωB97M-V and CF22D. For MGCDB84, CF22D has an MUE of 0.80 kcal mol −1 , behind only ωB97M-V (with an MUE of 0.71 kcal mol −1 ). The MN15 and MN15-D3(BJ) functionals are the seventh and eighth best in our comparison, with MUEs of 1.18 and 1.20 kcal mol −1 , respectively. The improvement of CF22D with respect to MN15 and MN15-D3(BJ), which share the same functional form for the density functional form, is a measure of the improvement made by the present supervised learning optimization.
The MGCDB84 database is divided 5 into eight sub-databases: NC 'easy' dimers (NCED, 1,744 data), NC 'easy' clusters (NCEC, 243 data), NC 'difficult' interactions (NCD, 91 data), 'easy' IEs (EIE, 755 data), 'difficult' IEs (DIE, 155 data), TC 'easy' (TCE, 947 data), TC 'difficult' (TCD, 258 data) and BHs (BH, 206 data). The RG10 (569 data) and AE18 (18 data) data sets do not fall into any of these sub-databases. Supplementary Table 8 presents the MGCDB84 results for 27 functionals (see Table  2 for details about the compared functionals) listed in order of their overall MUEs. CF22D gives the best performance on the NCD, TCE and TCD sub-databases and the second lowest MUE for the DIE and BH sub-databases. CF22D is among the five best-performing functionals for all eight sub-databases.
Of the 6,075 ground-state energies, 2,866 were used for training and 3,209 were used only for testing. Supplementary Table 9 shows that CF22D is the best-performing functional for both the training and the non-training (testing) sub-databases, with MUEs of 1.34 and 0.75 kcal mol −1 , respectively. The smaller MUE for the non-training data as compared with the training data apparently arises because the non-training data are easier to predict. The MUE averaged over 25 functionals is 45% smaller for the non-training data, and the corresponding percentage for CF22D is 44%. The commensurate performance across the two subsets provides evidence that the training does not suffer from overfitting and indicates good transferability of the prediction accuracy for the ground-state chemical properties.
The data in GSE6075 can be classified into four types of four chemical properties: BH, NC, IE and TC. Supplementary Table 10 shows that, among the 25 functionals compared (see Table 1  We can divide each of the four classes into training and testing data, and Supplementary Table 10 shows that CF22D has the best performance for six of them (BH_test112, NC_training936, IE_training293, IE_test826, TC_training1431 and TC_test402 categories), the second best for the BH_training206 category and the fifth best for the NC_test1869 category. The comparisons presented in Supplementary Tables 9 and 10 show that CF22D gives excellent performance for various types of properties and demonstrate that the predictive accuracy of the CF22D functional is highly transferable to properties that are not in the training set.
For the ground-state energies in DDB22, CF22D is not only the best-performing functional for the full set of 6,075 data among the 25 selected representative functionals but also the best functional for each of the four sub-databases NC, TC, BH and IE. We found that CF22D also shows excellent transferability on the diverse non-training test sets of transition-metal chemistry, including CUAGAU42, TMC34, TMBH22 and WCCR9. The MUE of CF22D for the whole set of 107 testing transition-metal data (that were not used for training) is 2.77 kcal mol −1 (Supplementary Table 11), which is the best among all the tested functionals. Especially for the CUAGAU42 and TMC34 data sets, we can compare the performance with ωB97M-V (Fig. 4). CF22D gives the best performance on the CUAGAU42 data set and also the best results overall. Detailed results for CF22D's performance on the CUAGAU42 and TMC34 databases can be found in Supplementary Section 2.5.1. CF22D is demonstrated to be an excellent energy functional for 'complex' systems with an MUE of 2.84 kcal mol −1 for the 886 data classified as complex (Supplementary Table 14). Figure 5a shows the potential energy curves of benzene-Ar calculated by three DFAs (M06-SX, revM06 and MN15) and two energy functionals with molecular mechanics (MN15-D3(BJ) and CF22D). The former DFAs, because they do not have nonlocal correlation and hence do not have long-range dispersion 24 , give curves that decay to zero quickly from 4.5 to 6.0 Å. The dispersion-corrected functional MN15-D3(BJ) shows a negligible long-range tail because the damped dispersion term for MN15 was added without re-optimizing the functional form. Since MN15 gives reasonably good results in the van der Waals region (because it contains a medium-range correlation energy 25,26 ), only a small, damped dispersion term was added. CF22D shows good agreement with the reference values both at the equilibrium position and in the long-range region in Fig. 5a.

Dispersion interactions
In Fig. 5b, CF22D shows similar good results for benzene-SiH 4 . This figure also shows that B3LYP-D3(BJ) provides a reliable long-range van der Waals tail but that, at the equilibrium position, it overestimates the benzene-SiH 4 binding energy by about 0.21 kcal mol −1 . The geometries and reference energies of benzene-Ar and benzene-SiH 4 are obtained from ref. 27 . Overall, CF22D provides generally reliable predictions for NC interactions, not only for the binding energies near the equilibrium distance but also for the weak interactions at long distance.

Other results
Results for electronic excitation energies, dipole moments, molecular structures, basis set superposition errors and grid errors, binding energies of extra-large complexes (ExL7) 25 , reactions of open-shell singlereference transition metal complexes (ROST61) 28 and the CUAGAU-2 (ref. 29 ) data set are presented in Supplementary Tables 16-22. CF22D outperforms the selected non-doubly hybrid functionals, especially for ExL7 and the CUAGAU-2 data sets (Supplementary Tables 20 and 22). For the ROST61 data set, the MUE results for the doubly hybrid functionals with a molecular-mechanics damped-dispersion term are listed in Supplementary Table 21, all being lower than 3 kcal mol −1 . The average value of the results for the functionals with a molecular-mechanics damped-dispersion term is 3.36 kcal mol −1 , whereas the average value of the results of non-doubly hybrid functionals is 4.64 kcal mol −1 . CF22D performs well with an MUE of 4.03 kcal mol −1 , which is better than the average MUE of the non-doubly hybrid functionals.

Discussion
Density functional theory (DFT) is the most popular electronic structure method, but many functionals are optimized only against limited specific groups of chemical properties, and few functionals can be applied to accurately predict all the properties required for complex chemical applications. We used physical descriptors, broad databases and supervised learning for the systematic optimization of a flexible functional form including the simultaneous optimization of a molecular-mechanics damped-dispersion term. As shown in Results section, CF22D can be recommended for applications involving a broad range of bonding and NC interactions of both main-group and transitionmetal compounds, which makes it appropriate for studies of catalysis, functional materials, biochemistry and environmental chemistry. However, as a global hybrid functional, CF22D has limitations because it contains Hartree-Fock (HF) exchange, even at long range: (1) it is not economical for plane wave codes because the treatment of long-range HF exchange in plane wave codes requires a fine mesh for integration over the Brillouin zone 30 , (2) long-range HF exchange causes a divergence of the group velocity at the Fermi level for solid-state systems (such as metals) that do not have a gap 31,32 and (3) HF exchange is known to cause a static correlation error 33 , although this is ameliorated in the present functional by parameterization to a training set with a high representation of strongly correlated systems. Another limitation is that the long-range dispersion terms do not take account of the partial atomic charge distributions in the interacting subsystems. Equation (1) is an energy functional based on seven features: spinup and spin-down electron density, spin-up and spin-down electron density gradient, spin-up and spin-down kinetic energy density and the set of internuclear distances (which are the geometries of the dimers embedded in the molecule). In the future, one may envision more general energy functionals in which the energy also depends on other variables such as the geometries of the trimers embedded in the molecule or other features (for example, ionization potentials) of the atoms, dimers and/or trimers embedded in the molecule. Thus, the energy functional considered here may be considered to be just an example of a move toward more complex energy functionals with a greater variety of features. It has been stated that "Feature selection methods provides us a way of reducing computation time, improving prediction performance, and a better understanding of the data in machine learning" 34 . Therefore, we see a future for density functional theory that may involve combining traditional functionals with functionals of other variables to produce machine learning functionals with even better combinations of accuracy and efficiency.

Methods
Basing the loss function and the additional testing of the output functional on broad and diverse databases is a key aspect in the present work. We train the functional with a database including nearly 3,000 data. The training data are organized into a variety of energetic data sets for different categories of energies, and we also consider subdatabases encompassing subsets of the data sets. An additional set of about 3,800 data not used for training are used as a testing set. The testing set includes BHs, NC interactions, TC, IEs, excitation energies, bond lengths and dipole moments.

The density functional term
Our energy functional has two kinds of terms: a DFA and a molecularmechanics term representing damped dispersion. The functional form is where the E DF is an exchange-correlation term with the functional form of the successful MN15 functional and E disp is a molecularmechanics term that is conventionally called a damped dispersion term. Note that the damped dispersion term accounts for more than dispersion at short range, and dispersion is not uniquely defined for geometries where there is overlap of the wave functions of interacting subsystems. The parameters in E DF were optimized simultaneously with a parameter in E disp . For E DF , we chose the form of the previously successful MN15 functional 15 . This is a linear combination of the nonlocal singledeterminant exchange energy E HF x , a local nonseparable exchange-correlation energy E nxc and an additional correlation energy E c : where X is the percentage of HF exchange E HF x , ρ α and ρ β are the up-spin and down-spin electron densities at the spatial point r, ρ is their sum, τ α and τ β are the spin-up and spin-down kinetic energy density and the functions v xσ , u xσ , w σ , ε LSDA xσ , ε LSDA C and H PBE are the same as used in the MN15 functional 15 and are therefore not re-explained here. The parameters X, a ijk , b i and c i in equations (2)(3)(4) of CF22D are shown in Supplementary Table 1.

Damped dispersion
The DFT-D3(0) model 35 is the starting point for the molecular-mechanics term used here. The D3(0) treatment has r AB -6 and r AB -8 terms, where r AB is the distance between atoms A and B, but only the r AB −6 term is used in the present work because our goal is to obtain only the longest-range dispersion term by molecular mechanics. The term we use has the unscaled form where the sum includes all the atom pairs in the system, C AB 6 is the D3(0) dispersion coefficient that depends on the atomic coordination numbers CNA and CN B , which depend on the system's geometry, and where s r,6 is a scaling parameter optimized in the present work and R AB 0 is the pair-specific cut-off radius parameterized in DFT-D3(0) for the 4,465 values of all atom pairs AB composed of the first 94 elements of the Periodic Table 35 . The optimization method of s r,6 for CF22D is presented in Supplementary Section 1, and the resulting value of s r,6 is provided in Supplementary Table 1.

The loss function
The loss function is in which and K is the number of training data sets, R n is the r.m.s. error (RMSE) for data set n in Supplementary Data 5, I n is the inverse weight of subset n, λ(a + b + c) is an L 2 regularization term that serves as a smoothness restraint 36,37 and λ is a smoothing coefficient 37 that was set to 0.01 for CF22D.
Article https://doi.org/10.1038/s43588-022-00371-5 The value of the loss function depends on the inverse weights. Our goal in training the energy functional was to obtain small errors across the board, that is, relatively small errors for as many data sets and sub-databases as possible, not to simply reduce the overall mean unsigned error for the entire training data set or the absolute value of the loss function. The final selection of the inverse weights was therefore determined iteratively by substantial trial and error to obtain uniformly good performance across the full collection of data sets, as discussed below.

The DDB22 database
In this work, we built a combined database called the Diverse Database 2022 (DDB22), which includes 155 data sets made up of a total of 6,572 data. All the component data sets are shown in Fig. 1b, 7 as representative of a much larger database of metalorganic reaction energies, dissociation energies of diatomic transition-metal species and reaction barriers involving complexes of second-and third-row transition metals. It is divided into the TC data sets TMD10 and MOR13 and the BH data set TMB11. • The CUAGAU42 database of Chan 6 for small copper, silver and gold compounds. It contains two data sets: CUAGAU_TC27 for TC and CUAGAU_IE15 for IEs.
Data sets from various databases have some degree of overlap. The MGCDB84 database includes the GMTKN30 (ref. 40 ) database (a predecessor of GMTKN55 that is partially represented and partially updated in GMTKN55) and previous Minnesota databases, and the GMTKN55 database also has some overlap with previous Minnesota databases. The overlapping data of MDB2019, GMTKN55 and MGCDB84 are shown in Fig. 1b (see Supplementary Table 2 for more details on these overlaps and how they were resolved to create the consolidated database).
We used the entire DDB22 to compare the performance of the CF22D functional with selected other functionals, but only a portion of it was used for the training and validation steps. For some of the discussion, to better understand the validation and testing tests, we divide DDB22 into four sub-databases: • Ground-state energies (sub-database GSE6075, with energies in kcal mol −1 ) that consists of 6,075 data of ground-state energetic data from 13 data sets of BHs, 44 NC interaction energy data sets, 30 IEs data sets and 55 TC data sets (this sub-database contains 6,057 relative energies and 18 absolute atomic energies) • Excitation energies (EE157, with electronic excitation energies in eV), consisting of 157 data of excitation energetic data from ten data sets • Molecular structures (MS261, with interatomic distances in Å) consisting of 261 data from five molecular structure data sets • Dipole moments (DM79, with dipole moments in Debye) consisting of 79 data from one database of dipole moments These classifications are specified in detail in Supplementary Data 1.

Training
Our learning scheme involves performance-triggered iterative supervised training. For brevity, we call this supervised learning. Our supervised learning scheme differs from the active learning schemes that were developed for labelling problems. In those cases, the machine queries the supervisor about troublesome unlabelled data, and the supervisor labels the data 41 . Our application is in the regression and prediction area rather than the labelling area. Our supervised learning scheme is closer to the active learning method developed by Zhang et al. 42 for neural net modelling of force fields, but with some differences because we group data into data sets of related data and because we do not use a neural net. Our method also differs from machine learning schemes that divide the data randomly among the training and validation sets in that we divide the data in a more organized fashion using the data sets. The three steps in our supervised learning, following the development of the initial model with an initial training set, are as follows: (1) wider testing in a step that replaces the conventional validation step with one that uses the current model to explore additional data sets spanning a broader domain than had been used to develop the existing model and identifies poorly fit data sets; (2) augmentation, in which we add the troublesome data sets to the training set; (3) retraining. The machine develops a model based on the augmented data. We then repeat these steps until convergence is reached. An active learning scheme with this kind of sequence was presented by Schmidt et al. 43 . They described their active learning schemes as follows: "(i) A surrogate model has to be developed; (ii) Based on the prediction of the surrogate model, optimal infill points have to be chosen in order to retrain the surrogate model and finally find the optimum.".
Our workflow to implement the above supervised learning method is summarized schematically in Fig. 1a. Here we provide a detailed description: 1. We select 79 data sets (data sets 1-79 from AME418 and MGCDB84, listed in Supplementary Data 5) with a total of 1,886 data as the initial training set. The initial inverse weight of each data set in AME418 is the same as the one utilized in the final optimization of the M06-SX functional 39 . The initial inverse weight of each selected data set in MGCDB84 is chosen as the average MUE for that data set as averaged over 200 exchange-correlation functionals (previously published and developed by many different groups) as given in the original MGCDB84 article 5 .
Note that Supplementary Data 5 shows 92 data sets with 3,694 data. Data sets 80-92 with 1,808 data constitute the initial validation set. We also initialize the s r, 6 parameter in the damped dispersion. Using the standard notation, data sets 1-79 are training data and data sets 80-92 are initially validation data, but some of them are converted to training data by the supervised learning procedure of step 6. The testing data are described in Article https://doi.org/10.1038/s43588-022-00371-5 2. The electron densities of all systems in the training set are calculated by using the MN15 functional and applied as the initial densities. 3. Each descriptor in the CF22D functional described by equation (1) is calculated for all the systems in the training set based on the electron densities generated by the functional of the previous step (step 2 in the first iteration and step 6 in subsequent ones). The R n value of each data set in equation (8) can be expressed as a function of s r, 6 in equation (7) and the coefficients in the density functional term, namely X in equation (2), a ijk in equation (3) and b i and c i in equation (4). Thus, the loss function of equation (8) is a function of those variables and the I n of each subset. 4. The loss function of equation (8) is minimized using the generalized reduced gradient nonlinear algorithm for a given s r, 6 value, and the s r,6 value in equation (7) is varied to minimize the MUE of the training set. (Initially we vary the value of s r,6 from 1.2 to 2.1 Å with an initial interval of 0.1 Å, but the interval is gradually reduced.) This yields a new value of s r, 6 , and a new set of density functional coefficients is obtained. 5. Using the trial functional obtained from step 4, the energies of all systems in the training and validation sets are calculated, and the MUE for each data set in the training and validation sets is calculated. 6. This is the supervised learning step. If the MUE of the trial functional for one data set in the validation set is 30% higher than the average MUE of the top five functionals for this data set based on the results from ref. 5 , then this data set is moved to the training set with the inverse weight determined by the same method as used in step 1. We then modify selected inverse weights (both the ones inherited from previous steps and the new ones) to improve, if possible, the performance on the various sub-databases where we wish to reduce the error to obtain small errors across the board. The final selection of inverse weights in this step is determined by substantial trial and error to try to obtain uniformly good performance across the full collection of data sets. 7. If a validation data set is moved into the training set in step 3, the electron densities of all the systems in the training set are recalculated, and we return to step 3. If no new data set is moved in step 6, we compare the MUE of the training set with the value in the previous iteration. If the MUE is not converged, the electron densities of all systems in the training set are recalculated, and we return to step 3. If the MUE of the training database is converged, we proceed to step 8. 8. At convergence, the results of CF22D for all the training and test data sets are calculated and compared with other density functionals.
After five rounds of iteration and validation, the supervised learning added ten data sets containing 1,033 data (data sets 80-89 in Supplementary Data 5). The data sets added by supervised learning are all from the BH76 group (BH76RC and DBH24) and the W4-11 group (HAT707MR, HAT707nonMR, BDE99MR, BDE99nonMR, TAE140MR, TAE140nonMR, ISOMERIZATION20 and SN13) of the MGCDB84 database.
In the final iteration, s r,6 = 1.53 Å, with which the overall MUE of the selected data sets was the lowest (Supplementary Section 1). The optimized parameters of the CF22D functional are given in Supplementary Table 1.

Computational details
The CF22D calculations were performed using a locally modified version of Gaussian 16 revision A.03 (ref. 44 ), while all the calculations with the other functionals in this work were performed using the unmodified Gaussian 16, revision A.03.
The basis sets, molecular geometries and quadrature grids for the calculations on MDB2019 (refs. 3,22,39 ) were the same as those employed in our previous works 22,39,45 and can be found in Supplementary Table  21 of ref. 22 . For the calculations on the GMTKN55 (ref. 4 ) database, MGCDB84 (ref. 5 ) database, transition-metal data sets TMC34 (ref. 7 ) and CUAGAU42 (ref. 6 ), the settings were the same as those employed in the original papers. The basis set is mainly def2-QZVP for GMTKN55 (diffuse functions were applied to some atoms in some of the data sets, and core electrons of heavy elements in some molecules of HEAVYSB11, HAL59 and HEAVY28 were replaced by the def2-ECP effective core potentials). The basis set is def2-QZVPPD for MGCDB84. The basis sets are def2-QZVPP for CUAGAU42, CUAGAU-2 and ROST61, def2-TZVP for TMC34, and cc-pVTZ for ExL7.
A (99, 590) grid (99 radial shells with 590 grid points per shell) was used for all of the data sets, except AE18 and RG10, for which a (500, 974) grid was used.

Additional data and references
Additional data from this study and additional references are provided in the Supplementary Information.

Data availability
The optimized parameters of the CF22D functional are available in Supplementary