Abstract
Quantummechanical methods are used for understanding molecular interactions throughout the natural sciences. Quantum diffusion Monte Carlo (DMC) and coupled cluster with single, double, and perturbative triple excitations [CCSD(T)] are stateoftheart trusted wavefunction methods that have been shown to yield accurate interaction energies for small organic molecules. These methods provide valuable reference information for widelyused semiempirical and machine learning potentials, especially where experimental information is scarce. However, agreement for systems beyond small molecules is a crucial remaining milestone for cementing the benchmark accuracy of these methods. We show that CCSD(T) and DMC interaction energies are not consistent for a set of polarizable supramolecules. Whilst there is agreement for some of the complexes, in a few key systems disagreements of up to 8 kcal mol^{−1} remain. These findings thus indicate that more caution is required when aiming at reproducible noncovalent interactions between extended molecules.
Similar content being viewed by others
Introduction
The most accurate methods for studying matter at the atomic scale are wavefunctionbased approaches, which explicitly account for manyelectron interactions. Given only the positions and nuclear charges of atoms, we can now predict, among basically every observable property, the binding strength of relatively small molecular systems (i.e., <50 atoms) to within a few tenths of a kcal mol^{−1} using manybody solutions to the Schrödinger equation^{1,2,3}. This value is better than the socalled “chemical accuracyˮ of 1 kcal mol^{−1} required for reliable predictions of thermodynamic properties. Indeed, the relative stabilities of many noncovalently bound materials such as 2D layered materials, pharmaceutical drugs, and different polymorphs of ice, are underpinned by small energy differences on the order of tenths of a kcal mol^{−1}^{4}. However, experimentally determining binding affinities under welldefined, pristine conditions is notoriously challenging^{5}. In addition, thousands of computational works describe physical interactions in materials, which are not well understood at the experimental level, for instance, as part of rational design initiatives in novel materials including soft colloidal matter, nanostructures, metal organic, and covalent organic frameworks^{6,7,8}. The present shortage of benchmark information is a major setback for forming reliable predictions across the natural sciences and is frequently addressed through demanding, but increasingly feasible, wavefunctionbased methods. However, extending the use of highlyaccurate methods to a regime of larger molecules is hindered by theoretical and technical challenges due to the steep increase in computational cost required for an accurate description of manyelectron interactions^{9,10}.
Here we use two widely trusted wavefunction methods that can provide subchemically accurate solutions to the electronic Schrödinger equation for noncovalent interactions. First, we utilize coupledcluster (CC) theory with single, double, and perturbative triple excitations [CCSD(T)]^{11}—approximated via the local natural orbital (LNO) scheme to be practicable [LNOCCSD(T)]^{12,13}. Coupled cluster theory has gained great prominence in the last 30 years and the label of ‘goldstandard’ for remarkable accuracy on virtually all systems in its domain of applicability^{14}. Second, a stochastic quantum method that computes the energy for the manyelectron wavefunction directly is known as fixednode diffusion Monte Carlo (FNDMC)^{15}. This method has seen a surge of use in recent years, particularly for predicting large molecules and periodic systems with noncovalent interactions^{10,16,17}, such as molecular crystals^{18,19} and adsorption on 2D materials^{16,20,21,22}. The accuracy and suitability of FNDMC in complex noncovalently bound extended materials has been established through excellent agreement with a wealth of different experiments. For example FNDMC has accurately predicted the binding energy of bilayer graphene^{17} and the cohesive energies of water ice polymorphs^{18}, as well as of carbon dioxide, ammonia, benzene, naphthalene, and anthracene crystals^{19}. These constitute highlypolarizable materials with significant longrange van der Waals interactions.
As we demonstrate in Fig. 1 and Table 1, CCSD(T) and FNDMC interaction energies are in subchemical agreement in small systems such as the benzenewater dimer^{21} and the dimers of benzene, pyridine, and uracil. Nonetheless, FNDMC and CCSD(T) are still prohibitively expensive for most applications in biology and chemistry, and as result, very little is known about how predictive these theoretical methods are in the regime of mediumtolarge polarizable molecules.
Straightforward extrapolations of interactions from small molecules to large complexes are difficult to make due to the interplay and accumulation of interactions that are nonadditive, anisotropic, or have manybody character^{21,23,24,25,26}. As such, a deeper understanding of noncovalent interactions can be gained by directly applying stateoftheart methods in larger molecular complexes. Here, we use frequently studied molecular data sets: a subset of the S66 by Řezáč et al.^{27} and the full L7 molecular data set from Sedlak et al.^{28} to ascertain the predictive power of FNDMC and CCSD(T) for medium to relatively large complexes involving intricate π − π stacking, electrostatic interactions, and hydrogenbonding (see Fig. 2). In addition, we consider a larger system of a C_{60} buckyball inside a [6]cycloparaphenyleneacetylene ring (which we label as C_{60}@[6]CPPA), consisting of 132 atoms. This structure has a number of interesting features: (i) an openframework that can be found in covalent organic frameworks and carbon nanotubes, (ii) the buckyball has a large polarizability (76 ± 8 Å^{3})^{29} which gives rise to considerable dispersion interactions, and (iii) confinement between the ring and the buckyball that may cause nontrivial longrange repulsive interactions^{30,31}.
Following recent algorithmic advances for more efficient CCSD(T) and FNDMC, we predict interaction energies for a set of mediumsized polarizable organic dimers and supramolecular complexes, and converge numerical thresholds to the best of our joint knowledge and expertize. Hereafter, we refer to CCSD(T) and FNDMC interaction energies but note that a number of approximations are used in both methods. More specifically, the CCSD(T) interaction energies we report come from systematically converging LNOCCSD(T) toward canonical CCSD(T) and accompanied with corresponding uncertainty estimates. Meanwhile, the significance of approximations in FNDMC interaction energies are assessed using statistical measures where error bars indicate 95% confidence intervals. Furthermore, to define agreement between CCSD(T) and FNDMC energies, we take into account the uncertainty estimates and a physically relevant energy window that is room temperature k_{b}T or 0.6 kcal mol^{−1}. First, interaction energies that differ by less than the combined error estimates from CCSD(T) and FNDMC are statistically indistinguishable. Second, interaction energies which are different by <0.6 kcal mol^{−1} outside of the combined error estimates are thermodynamically consistent. Above a difference of 0.6 kcal mol^{−1} outside of error bars the interaction energies are inconsistent, indicating disagreement between the methods.
In seven out of nine mediumsized highlypolarizable organic molecules computed here, CCSD(T) and FNDMC interaction energies are indistinguishable and thermodynamically consistent in the remaining two. Similarly, CCSD(T) and FNDMC interaction energies are either indistinguishable or thermodynamically consistent for five of the eight supramolecular complexes we consider, covering a range of interactions including hydrogenbonding and π − π stacking. However, we find that three key complexes reveal several kcal mol^{−1} differences between best estimated CCSD(T) and FNDMC calculations. Most notably, a substantial disagreement of 7.6 kcal mol^{−1} (or 20%) is found in the interaction energy (E_{int} as defined in Methods) of the buckyballring system. This 7.6 kcal mol^{−1} inconsistency remains on top of the uncertainty estimates incorporating all controllable sources of errors. We also gauge the impact of approximations intrinsic to each method, not covered in the numerical uncertainty estimates, and find that 7.6 kcal mol^{−1} is an order of magnitude beyond these. It is thus yet unclear whether this discrepancy would also be present between the approximationfree CCSD(T) and DMC results or it is a result of an unexplored source of error. As shown in Fig. 1 and in Table 1 below, such a sizable deviation cannot be explained solely by the sizeextensive growth of the difference between CCSD(T) and FNDMC. Consequently, the interaction energies of three of the supramolecular complexes considered here are still unsettled.
We applied two different, widelyused and wellperforming DFT approaches developed for capturing longrange dispersion interactions: DFT + D4^{32} and DFT + MBD^{33}. Both methods model London dispersion based on a coarsegrained description and account for all orders of manybody dispersion in different manner. See refs. ^{34,35} for an overview of various ways to capture dispersion in the DFT framework. We find that DFT + MBD closely matches FNDMC, while the recent DFT+D4 method agrees well with CCSD(T), irrespective of the level of disagreement between CCSD(T) and FNDMC. Therefore, the absence of either CCSD(T) or FNDMC references could incorrectly suggest that one of the DFT methods performs better than the other. This illustrates that the unprecedented level of disagreement amongst stateoftheart methods in large organic molecules has consequences well outside the developer communities.
CCSD(T) and FNDMC methods account for dynamic electron correlation through an expansion in electron configurations in the former and through the projection to the ground state wave function in the latter. These two equally viable formulations can be illustrated by the corresponding expressions of Ψ(R), the exact wavefunction:

1.
DMC: A propagation according to the imaginary time Schrödinger equation is performed to project out the ‘exact’ electronic ground state from a trial function Ψ_{T}(R):
$$\left{{\Psi }}({\bf{R}})\right\rangle =\mathop{\lim}\limits_{\tau \to \infty }\exp \ \left[\tau (\hat{H}{E}_{{\rm{T}}})\right]\left{{{\Psi }}}_{{\rm{T}}}({\bf{R}})\right\rangle$$(1) 
2.
CC: Expansion of excited determinants generated via the operator \({\hat{T}}_{n}\) from a reference wavefunction:
$$\left{{\Psi }}({\bf{R}})\right\rangle =\exp \ \left[\mathop{\sum }\limits_{n=1}^{N}{\hat{T}}_{n}\right]\left{{{\Psi }}}_{{\rm{T}}}({\bf{R}})\right\rangle$$(2)
The crucial challenge lies in extensively accounting for relatively small fluctuations in the electron charge densities. To this end, DMC is a stochastic approach, where the wavefunction is described through a set of configurations, otherwise referred to as walkers. Walkers evolve in imaginary time through discrete steps of size Δτ. The stochastic uncertainty associated with any DMC evaluation is inversely proportional to the square root of the sampling. In order to make this propagation efficient for an electronic wavefunction a few approximations are typically employed: the fixed node (FN) constraint^{15}, the use of pseudopotentials^{36,37,38,39}, and solutions enhancing the stability of walker populations^{40,41}. In noncovalent interactions, the challenge for FNDMC is to provide precise and accurate evaluations of the interaction energy E_{int}, despite E_{int} being a tiny fraction of the total energy, e.g., it is circa 1/10^{4} of the total energy in the C_{60}@[6]CPPA complex. Precision is achieved by exploiting the almost perfect scaling of DMC on modern supercomputer facilities^{42,43,44} and thanks to recent algorithmic improvements which reduced the timestep bias and made DMC up to 100 times more efficient^{40}. The FNDMC setup here employed has been used and validated against experiments and CCSD(T) a number of times, for instance in refs. ^{18,19,21,45}.
In coupled cluster theory, noncovalent interactions require a highorder treatment of manyelectron processes, as is included in CCSD(T), and a sufficiently large singleparticle basis set. Reaching basis set saturation and wellcontrolled local approximations concurrently for the studied systems required previously unfeasible computational efforts as shown by the several kcal mol^{−1} scatter of interaction energy predictions reported for the L7 set (see Fig. 3). Our recent efforts enabled the following: (i) a systematically converging series of local CCSD(T) results is presented for highlycomplicated complexes, (ii) both the local and the basis set incompleteness (BSI) errors are closely monitored using comprehensive uncertainty measures^{13}, (iii) convergence up to chemical accuracy is reached for the complete L7 set concurrently in the local approximations as well as in the basis set saturation.
The benefit of such demanding FNDMC and CCSD(T) convergence studies is that the resulting interaction energies, up to the respective error bars, can be considered independent of the corresponding approximations. Consequently, we expect that the CBS limit of the exact CCSD(T) results could, in principle, be approached similarly using alternative basis sets^{10,46,47} or local correlation methods^{48,49,50,51}. For instance, using different basis set corrections and local approximations for CCSD(T), but an error estimate reminiscent of our approach presented here, the most recent L7 interaction energies of ref. ^{52} are identical to ours within the corresponding error estimates. In addition, FNDMC interaction energies for the L7 complexes have been reported very recently that are in close agreement with our results, even though a different algorithm and implementation has been employed^{53}.
We use highlyoptimized algorithms both for FNDMC and CCSD(T) as outlined in Methods, and push them beyond the typically applied limits. We used circa 0.7 and 1 million CPU core hours for FNDMC and CCSD(T), respectively. This is equivalent to running a modern 28 core machine constantly for ~7 years.
Results
Consensus in mediumsized highlypolarizable organic molecules
Demonstrating agreement between fundamentally different electronic structure methods for solving the Schrödinger equation provides a proofofprinciple for the accuracy of the methods beyond technical challenges. To date, disagreements beyond 1 kcal mol^{−1} on molecular systems between CCSD(T) and FNDMC have been reported for systems where key approximations, e.g., singlereference wavefunction, accurate nodestructure, and basis set completeness, were not completely fulfilled^{54,55,56}. Previously however, CCSD(T) and FNDMC were found in agreement within the error bars, for the interaction energies of small organic molecules with pure dynamic correlation^{9,16,57} as well as some extended systems^{19,21,58}.
Here we extend the list of systems with consistent CCSD(T) and FNDMC interaction energies with nine mediumsized dimers of 22–24 atoms taken from the S66 compilation^{27}. Interaction energies and corresponding errors estimates are collected in Table 1 for the parallel displaced (PD) and Tshaped (TS) dimers of benzene, pyridine, and uracil (see Fig. 2). The level of uncertainty in our results throughout this paper is indicated by the sum of local and BSI error estimates for CCSD(T). In FNDMC the main source of uncertainty is the stochastic nature of the approach, which is here accounted for by reporting a confidence interval of 95% (i.e., ± two standard deviations). To err on the side of finding agreement, we define \({{{\Delta }}}_{\min }\) as the absolute minimum difference between best converged CCSD(T) and FNDMC for the closest limits of the corresponding error bars, i.e., the smallest deviation between the methods according to the uncertainty estimates.
For the majority of mediumsized complexes in Table 1 FNDMC and CCSD(T) interaction energies are statistically indistinguishable as \({{{\Delta }}}_{\min }=0\) kcal mol^{−1}, with the exception of the benzeneuracil and the benzene PD dimers.
The benzene PD dimer has garnered much interest as a prototypical example of π − π stacking interaction. Previous predictions of the interaction energy from CCSD(T) are −2.69 kcal mol^{−1}^{59} in excellent agreement with our LNOCCSD(T) result of −2.67 ± 0.07 kcal mol^{−1} (see Table 1). However, previous FNDMC predictions of the benzene PD dimer use marginally different structures and algorithms, resulting in a wide range of predicted interaction energies^{60,61,62}. Here, using the latest DMC algorithms and wellconverged stochastic error bars we predict −2.38 ± 0.12 kcal mol^{−1} to be interaction energy of the PD benzene dimer (S66 structure) from FNDMC. This result is robust with respect to different nodal structures (as can be seen in Supplementary Note 2 B) and is therefore unlikely to be affected by the fixednode approximation. This leaves a −0.29 ± 0.19 kcal mol^{−1} discrepancy, or \({{{\Delta }}}_{\min }=0.1\) kcal mol^{−1} between FNDMC and CCSD(T), which is 11 ± 7% of the interaction energy. While the relative discrepancy can be considered nonnegligible, evidently the absolute energy difference is well within thermodynamic consistency. Therefore, even with welldefined error bars, CCSD(T) and FNDMC interaction energies are thermodynamically consistent for weakly interacting mediumsized dimers.
Losing consensus on supramolecular interactions
Establishing agreement for systems at the 100 atom range has been hindered by the sizable or unavailable error estimates for finite systems^{9}. For example, binding energies of large hostguest complexes derived from experimental association free energies^{63,64} motivated previous FNDMC^{65} as well as local CCSD(T)^{66} computations. While the average discrepancy of these FNDMC and local CCSD(T) binding energies was found to be about 2.4 kcal mol^{−1}, it is not possible to make conclusive remarks on the consistency of these results. Uncertainty estimates are unavailable for local CCSD(T), but could be comparable to the average discrepancy, while the error estimates reported for both experimental and FNDMC energies reach up to a few kcal mol^{−1}.
Here, we consider similar but somewhat smaller supramolecular complexes (Fig. 2) and obtain tightly converged local CCSD(T) and FNDMC results sufficient for rigorous comparisons (see Fig. 3 and Table 1). The complexes are arranged in Fig. 3 according to increasing interaction strength, which roughly scales with the size of the interacting surface. CCSD(T) and FNDMC agree on the interaction energy to within 0.1 kcal mol^{−1}, taking error bars into account, for a subset of the complexes we consider: GGG, CBH, GCGC, C3A, and PHE. These complexes are between 48 and 112 atoms in size and exhibit π − π stacking, hydrogenbonding, and dispersion interactions. Therefore, the agreement for these five complexes indicates their absolute interaction energies are established references and can be used to benchmark other methods for large molecules. Here, relative differences of very small interaction energies have to be interpreted carefully as they are sensitive to the uncertainty estimates. In GGG for example, the results are statistically indistinguishable whilst the relative disagreement is up to 65%. In contrast, the relative disagreement between FNDMC and CCSD(T) is better resolved in the more strongly interacting C_{60}@[6]CPPA complex, at 18–33%.
A salient and surprising finding is the disagreement between stateoftheart methods on the interaction energy of three nontrivial complexes: coronene dimer (C2C2PD), circumcoroneneGC base pair (C3GC), and buckyballring (C_{60}@[6]CPPA). The minimum differences (\({{{\Delta }}}_{\min }\)), as indicated in Table 1 and Fig. 3 are 1.1, 2.2, and 7.6 kcal mol^{−1} for C2C2PD, C3GC, and C_{60}@[6]CPPA, respectively, but the disagreements could be as high as 3.9, 6.9, and 13.7 kcal mol^{−1}, respectively. Considering the comparable size of C3A, PHE, and CBH to C2C2PD, C3GC, and C_{60}@[6]CPPA, the \({{{\Delta }}}_{\min }\) values of the latter three complexes are not explained simply by the large size or the large area of the interacting surface. CCSD(T) predicts consistently stronger interaction in these complexes than FNDMC, but at this point it is unclear what the exact interaction energies are.
C2C2PD has attracted the most attention to date in the CCSD(T) context as it represents a stepping stone between two widely studied systems: benzene dimer and graphene bilayer^{9}. Already C2C2PD has posed a significant challenge to various local CCSD(T) methods due to its slowlydecaying longrange interactions^{13,49,51,52,67,68,69,70}. Considerable efforts have been devoted recently^{13,49,51} to narrow down the local CCSD(T) interaction energy of C2C2PD to the range of about −19 to −21 kcal mol^{−1}. Thus the presently reported −20.6 ± 0.6 kcal mol^{−1} interaction energy and previous local CCSD(T) results, containing analogous local approximations, consistently indicate stronger interaction than FNDMC for C2C2PD. This trend, to a smaller extent, is also seen in the PD benzene complex and sizeextensive error propagation might be expected, but it is clearly insufficient to explain 18–31% relative disagreement found in C_{60}@[6]CPPA for example.
Distinct errors using DNA base molecules on circumcoronene
The C3GC and C3A complexes are ideal for assessing the convergence of CCSD(T) and FNDMC, due to their chemical similarity and importance of π − π stacking interactions, i.e., nucleobases stacked on circumcoronene. CCSD(T) and FNDMC agree within 1 kcal mol^{−1} for the interaction energy of C3A, whereas there is a notable disagreement of at least 2.2 kcal mol^{−1} in the interaction energy of C3GC. Interestingly, both systems involve similar interaction mechanisms, with C3GC exhibiting both stacking and hydrogenbonding interactions.
CCSD(T) and FNDMC interaction energies involve multiple approximations. In Fig. 4 we analyze the most critical approximations for each method on the example of the C3A and C3GC complexes, and we also consider the other remaining known sources of error in Methods.
In obtaining CCSD(T) interaction energies, the sources of error are:

Singleparticle basis representation of the CCSD(T) wavefunction.

Local approximations of longrange electron correlation according to the LNO scheme.

Neglected core electron correlation.

Missing highorder manyelectron contributions beyond CCSD(T).
For the singleparticle basis representation in CCSD(T) we employed conventional correlationconsistent basis sets augmented with diffuse functions^{71}, augccpVXZ (X = T, Q, and 5) as shown in panel a) of Fig. 4. The remaining BSI is alleviated using extrapolation^{72} toward the complete basis set (CBS) limit [CBS(X,X + 1), X = T, Q], and counterpoise (CP) corrections^{73}. The local errors decrease systematically as the LNO threshold sets are tightened (Normal, Tight, very Tight) enabling extrapolations, e.g., Normal–Tight (N–T), to estimate the canonical CCSD(T) interaction energy^{13} (see panel b) of Fig. 4). Exploiting the systematic convergence properties, an upper bound for both the local and the BSI errors can be given without relying on their potential cancellation of errors (see Methods).
Benchmarks presented previously for energy differences of a broad variety of systems showed excellent overall accuracy at the Normal–Tight extrapolated LNOCCSD(T)/CBS(T,Q) level (M1)^{13}. However, the BSI error bar of 1.0 kcal mol^{−1} and the local error bar of 2.2 kcal mol^{−1} obtained for C3GC at this M1 level are impractical for a definitive comparison with FNDMC. The next steps along both series of approximations toward chemical accuracy, i.e., the use of very Tight LNO thresholds and the augccpV5Z basis set (M2), have been enabled by our recent method development efforts^{12,13,74}. With these better converged interaction energies, the M2 level uncertainty estimates are up to a factor of three smaller than at the M1 level. Explicitly, 0.7 (0.4) kcal mol^{−1} local (BSI) error estimate is obtained for C3GC. The same measures are the largest for C_{60}@[6]CPPA at the M2 level being 1.1 and 0.6 kcal mol^{−1}, respectively. Moreover, for the remaining L7 complexes, the local (BSI) uncertainty estimates indicate even better convergence of 0.1–0.4 (0.1–0.3) kcal mol^{−1}. Additional details are provided in Methods and in Supplementary Note 1 of the Supplementary Material (SM).
Known sources of error to consider in our FNDMC calculations are:

The fixednode approximation which restricts the nodalstructure to that of the input guiding wavefunction.

Timestep bias from the discretization of imaginary time for propagating the wavefunction.

Pseudopotentials to approximate core electrons for each atom.
First, we analyze the most pertinent source of error in FNDMC which is the fixednode approximation. The different nodal surfaces from DFT methods serve to indicate the dependence of the FNDMC interaction energy on the nodal structure. Indeed, from Fig. 4, we find no indication that the FNDMC interaction energies of C3GC is affected by the nodal structure with the results being statistically indistinguishable. This shows that DMC estimations are robust for different ways to initialize the orbitals in a SlaterJastrow ansatz. However, we cannot conclude that the FN bias is negligible, as it is possible that a more involved multireference ansatz yields different results. Unfortunately, a multireference ansatz implies a much larger computational effort, which is not yet possible on the large systems discussed here. Second, FNDMC energies are sensitive to the timestep and we rely on recent improvements in FNDMC algorithms^{36,40}, that enable convergence of timesteps as large as 0.05 a.u. We used 0.03 a.u. and 0.01 a.u. timesteps to compute the interaction energies of C3A and C3GC. Figure 4 indicates that the interaction energy is statistically indistinguishable for the different timesteps considered here for both C3A and C3GC. The timestep and fixednode approximations perform similarly well for the coronene dimer and the buckyballring complex (see Supplementary Note 2 C and D of the SM). Third, recently reported allelectron FNDMC interaction energies for the L7 complexes^{53} are in agreement with our pseudopotentialbased FNDMC results. Therefore, our FNDMC interaction energies are also robust with respect to the use of pseudopotentials.
Open challenges for next generation of manybody methods
CCSD(T) and FNDMC have been shown to agree with subchemical accuracy for small organic dimers^{9,16,57}, molecular crystals^{18,19}, and small physisorbed molecules on surfaces^{21,58}. Indeed, we also find good agreement in the absolute interaction energies for five of the eight complexes considered here. However, we find that the disagreement by several kcal mol^{−1} in C_{60}@[6]CPPA particularly, cannot be explained by the controllable sources of error. While both methods are highly sophisticated, they are still approximations to the exact solution of the manyelectron Schrödinger equation. Moreover, there can be nontrivial coupling between approximations within each method, which remain poorly understood for complex manyelectron wavefunctions.
Are we there yet with FNDMC?
The reported interaction energies of C2C2PD, C3GC, and C_{60}@[6]CPPA indicate that FNDMC stabilizes the interacting complexes more weakly than estimated CCSD(T). Therefore, one possibility for the discrepancy between the methods is that FNDMC (as applied here) does not capture the correlation energy in the bound complexes sufficiently. Reasons for this can include the fixednode approximation and more generally, insufficient flexibility in the wavefunction ansatz.
The SlaterJastrow ansatz was applied here using a single determinant combined with a Jastrow factor containing explicit parameterizable functions to describe electronelectron, electronnucleus, and electronelectronnucleus interactions. We have evaluated FNDMC interaction energies for different nodal structures for C3GC, C2C2PD, C_{60}@[6]CPPA and in all cases the FNDMC interaction energies are in 1σ agreement (see Supplementary Note 2) with stochastic errors that are mostly under 1 kcal mol^{−1}. Among these systems, the largest potential deviation (Δ_{max}) due to the fixednode error is estimated to be ~ 3.7 kcal mol^{−1} in C_{60}@[6]CPPA. Although this potentially large source of error is not enough to explain the 7.6 kcal mol^{−1}\({{{\Delta }}}_{\min }\) disagreement with CCSD(T), it remains a pertinent issue for establishing chemical accuracy. Reducing the fixednode error, for example by using more than one Slater determinant to systematically improve the nodal structure, in such large molecules remains challenging^{75,76}. Promising alternatives include the Jastrow antisymmetrized geminal power approach which has recently been shown to recover nearexact results for a small, strongly correlated cluster of hydrogen atoms^{77}.
The Jastrow factor is a convenient approach to increase the efficiency of FNDMC since in the zero timestep limit and with sufficient sampling, the FNDMC energy is independent of this term. However, the quality of the Jastrow factor can be nonuniform for the bound complex and the noninteracting fragments, which can introduce a bias at larger timesteps. The recent DLA method in FNDMC reduces this effect^{36} and was applied to the C_{60}@[6]CPPA complex reported in Table 1 and also tested for GGG, C3A, and C2C2PD complexes (see Methods for further details). In all cases, FNDMC with DLA is in agreement (95% confidence interval) with nonDLA FNDMC interaction energies. For example, the C2C2PD FNDMC interaction energy with DLA is − 17.4 ± 1.0 kcal mol^{−1} whilst with standard LA, it is − 18.1 ± 0.8 kcal mol^{−1}. Moreover, the interaction strengths tend toward being weaker with DLA in the systems we consider, i.e., further from the CCSD(T) interaction energies. As such, the discrepancy between FNDMC and CCSD(T) remains regardless of any potential error from the Jastrow factor in our findings.
We estimate the error from the use of Trail and Needs pseudopotentials^{78,79} in FNDMC at the HartreeFock (HF) level using interaction energy of C2C2PD. We find 0.1 kcal mol^{−1} difference in the HF interaction energy with the employed pseudopotentials and without (i.e., allelectron) which is well within the acceptable uncertainty for our findings. In addition, recently computed allelectron FNDMC interaction energies of the L7 data set are in agreement with our predictions^{53}.
In principle, a more flexible wavefunction ansatz allows a more accurate manybody wavefunction to be reached in DMC, thus recovering electron correlation more effectively. To this end, recently introduced machine learning approaches^{80,81} are promising but more expensive due to the considerable increase in parameters. However, once feasible, a systematic assessment of the amount of electron correlation recovered by these different ansatze in noncovalently bound systems will bring valuable insight to the current puzzle.
Potential avenues for improvement upon CCSD(T)
Considering the complexes exhibiting significant ππ interactions, CCSD(T) is found to predict stronger interaction than FNDMC. Approximations used for some of the small, longrange energy contributions in local CC methods^{49,82} could potentially lead to overestimated interactions. In the case of the LNO scheme, the majority of the local approximations have marginal effect on the interaction energies when very Tight settings are employed^{13}. For the most complicated case of C_{60}@[6]CPPA, only 7% (−2.9 kcal mol^{−1}) of its interaction energy results from longrange contributions approximated at the secondorder, the full manybody treatment up to CCSD(T) level is utilized for the remaining 93% (see Eq. 1 of the SM). While the corresponding 1.1 kcal mol^{−1} error bar appears to estimate the local approximations well (see Supplementary Note 1 B), remaining uncertainties outside of the presented error bars cannot be ruled out.
The employed singleparticle basis sets perform exceptionally well for CCSD(T) computations of small molecules^{71,72}, but approaching the CBS limit of CCSD(T) for large systems is mostly an uncharted territory in the literature^{13,49}. The agreement of CP corrected CBS(T,Q), CBS(Q,5), and uncorrected CBS(Q,5) within 0.06–0.36 kcal mol^{−1} is highly satisfactory (see Supplementary Note 1 A). Currently, Gassian functions based implementations appear to approach the CBS limit of CCSD(T) for extended systems. However, alternative CC methods utilizing planewave or realspace representations^{10,46,47} as well as explicitly correlated wavefunction forms^{49,51} could offer advantages to overcome the basis set superposition error and relatively slow convergence associated with Gaussian basis sets for delocalized systems.
The higherorder contribution of three, four, etc. electron processes on top of CCSD(T)^{83,84} are usually found to be negligible for weaklycorrelated molecules^{57}. However, the available numerical experience is limited to complexes below about a dozen atoms, and for some highlypolarizable systems the beyond CCSD(T) treatment of threeelectron processes has been shown to contribute significantly to threebody dispersion^{85}. The weaklycorrelated nature of all complexes is indicated by the perturbative (T) contribution to the total correlation energy component of the CCSD(T) interaction energy being consistently 18–20%. In addition, the CC amplitude based measures all point to pure dynamic correlation (see Supplementary Note 1 B). Due to the extreme computational cost of such higherorder CC computations, it remains an open and considerable challenge to establish whether the contribution of higherorder processes is within subchemical accuracy for larger and more complex molecules.
Insights from experiments and comparison with densityfunctional approximations
Experimental binding energies or association constants of supramolecular complexes are particularly valuable, when available, but also have their limitations as backcorrections are needed to separate the effects of thermal fluctuations and solvent effects for example^{86}. In the case of C_{60}@[6]CPPA for example, the association constant is measured in a benzene solution and indicates a stable encapsulated complex, but one which could not be wellcharacterized by Xray crystallography; purportedly due to the rapid rotation of the buckyball guest^{87}. Instead, a nonfully encapsulated structure was successfully characterized using toluene anchors on the buckyball. This demonstrates a number of physical leaps that exist between what can be measured and what can be accurately computed.
Other highlevel methods, such as the full configuration interaction quantum MonteCarlo (FCIQMC) method^{10,46}, can be key to assessing the shortcomings from major approximations such as the FN approximation and static correlation. Once the severe scaling with system size associated with FCIQMC and similar methods is addressed, larger molecules will become feasible. However, in the present time the lack of references in large systems remains a salient problem.
The scarcity of reference information has an impact on all other modelling methods, including densityfunctional approximations (DFAs), semiempirical, force field or machine learning based models, etc. which are validated or parameterized based on higherlevel benchmarks. In particular, there is a race to simulate larger, more anisotropic, and complex materials, accompanied by a difficulty of choice for modelling methods. To demonstrate the consequences of inconsistent references, Fig. 5 shows interaction energy discrepancies obtained with DFAs, PBE0+D4^{32} and PBE0+MBD^{33}, that are both designed to capture all orders of manybody dispersion interactions in different manner. Intriguingly, the PBE0+D4 method is in close agreement with CCSD(T) (mean absolute deviation, MAD = 1.1 kcal mol^{−1}), whereas PBE0+MBD is closer to FNDMC (MAD = 1.5 kcal mol^{−1}), but their performance is hard to characterize when CCSD(T) and FNDMC disagree. Moreover, we decomposed the interaction energies from the DFAs into dispersion components and find that, for C_{60}@[6]CPPA the main difference between PBE0+MBD and PBE0+D4 is 6.5 kcal mol^{−1} in the twobody dispersion contribution. Differences in beyond twobody dispersion interactions are smaller and at most 1.6 kcal mol^{−1} in C_{60}@[6]CPPA.
Discussion
Until now, disagreements between reference interaction energies of extended organic complexes have typically been ascribed to unconverged results due to practical bottlenecks. Here, we report highlyconverged results at the frontier of wavefunctionbased methods; uncovering a disconcerting level of disagreement in the interaction energy for three supramolecular complexes. We have computed interaction energies from CCSD(T) and FNDMC for a set of supramolecular complexes of up to 132 atoms exhibiting challenging intermolecular interactions. The accuracy of these methods have been repeatedly corroborated in the domain of dozenatom systems with singlereference character and here we find CCSD(T) and FNDMC are in excellent agreement for five of the supramolecular complexes suggesting that these methods are able to maintain remarkable accuracy in some larger molecules. However, FNDMC and CCSD(T) interaction energies disagree by 1.1 kcal mol^{−1} in the coronene dimer (C2C2PD), 2.2 kcal mol^{−1} in GC base pair on circumcoronene (C3GC) and 7.6 kcal mol^{−1} in a buckyballring complex (C_{60}@[6]CPPA). These disagreements are cemented by reporting subkcal mol^{−1} standard deviations in FNDMC and a systematically converging series of local CCSD(T) interaction energies accompanied by uncertainty estimates approaching chemical accuracy. Therefore, despite our best efforts to suppress all controllable sources of error, the marked disagreement of FNDMC and CCSD(T) prevents us from providing conclusive reference interaction energies for these three complexes. Such large differences in interaction energies surpass the widelysought 1 kcal mol^{−1} chemical accuracy and indicate that the highest level of caution is required even for our most advanced tools when employed at the hundredatom scale.
The supramolecular complexes we report feature π − π stacking, hydrogenbonding, and intermolecular confinement, that are ubiquitous across natural and synthetic materials. Thus our immediate goals are to elucidate the sources of the underlying discrepancies and to explore the scope of systems where such deviations between reference wavefunction methods occur. Welldefined reference interaction energies and the better characterization of their predictive power have growing importance as they are frequently applied in chemistry, material, and biosciences. Our findings should motivate cooperative efforts between experts of computational and experimental methods in obtaining welldefined interaction energies and thereby extending the predictive power of first principles approaches across the board.
Methods
The L7 structures have been defined by Sedlak et al.^{28} and structures can be found on the begdb database^{88}. Note that the interaction energy, E_{int}, is defined with respect to two fragments even where the complex consists of more than two molecules (as in GGG, GCGC, PHE, and C3GC):
where E_{com} is the total energy of the full complex, and \({E}_{\,\text{frag}\,}^{1}\) and \({E}_{\,\text{frag}\,}^{2}\) are the total energies of isolated fragments 1 and 2, respectively. The fragment molecules have the same geometry as in the full complex, i.e., not relaxed. Further details on the configurations can be found in the SM and in ref. ^{28}.
The C_{60}@[6]CPPA complex is based on similar complexes in previous theoretical and experimental works^{65,89,90} and has been chosen to represent confined ππ interaction that are numerically still tractable by our methodologies. Its geometry has been symmetrized to D_{3d} point group, the individual fragments of C_{60} and [6]CPPA are kept frozen (I_{h} and D_{6h}, respectively). The structure is provided in the SM.
The local natural orbital CCSD(T) method
In order to reduce the N^{7}scaling of canonical CCSD(T) with respect to the system size (N), the inverse sixth power decay of pairwise interactions can be exploited (local approximations) and the wavefunction can be compressed further via natural orbital (NO) techniques.^{82} Building on such costreduction techniques a number of highlyefficient local CCSD(T) methods emerged in the past decade^{12,13,48,49,50,51,82,91,92}. As the local approximationfree CCSD(T) energy can be approached by the simultaneous improvement of all local truncations in most of these techniques, in principle, all local CCSD(T) methods are expected to converge to the same interaction energy. Here we employ the local natural orbital CCSD(T) [LNOCCSD(T)] scheme^{12,93}, which, for the studied systems, brings the feasibility of exceedingly wellconverged CCSD(T) calculations inline with FNDMC. The approximations of the LNO scheme automatically adapt to the complexity of the underlying wavefunction and enable systematic convergence toward the exact CCSD(T) correlation energy, with up to 99.99% accuracy using sufficiently tight settings^{13}.
The price of improvable accuracy is that the computational requirements can drastically increase depending on the nature of the wavefunction: while LNOCCSD(T) has been successfully employed for macromolecules, such as small proteins at the 1000 atom range^{12,13}, sizable longrange interactions appearing in the here studied complexes pose a challenge for any local CCSD(T) method^{13,48,49,51}. This motivated the implementation of several recent developments in our algorithm and computer code over the lifetime of this project, which cumulatively resulted in about 2–3 orders of magnitude decrease in the timetosolution and data storage requirement of LNOCCSD(T)^{12,13,93}, and made wellconverged computations feasible for all complexes. For instance, we have designed a massively parallel conventional CCSD(T) code specifically for applications within the LNO scheme^{94} and integrated it with our highly optimized LNOCCSD(T) algorithms^{12,13,93}. Here, we report the first largescale LNOCCSD(T) applications which exploit the resulted high performance capabilities using the most recent implementation of the Mrcc package^{74} (release date February 22, 2020).
Computational details for CCSD(T)
The LNOCCSD(T)based CCSD(T)/CBS estimates were obtained as the average of CPcorrected and uncorrected (“half CP”)^{73}, Tight–very Tight extrapolated LNOCCSD(T)/CBS(Q,5) interaction energies^{13}. Except for C3A, C3GC, and C_{60}@[6]CPPA, the CBS(Q,5) notation refers to CBS extrapolation^{72} using augccpVXZ basis sets^{71} with X = Q and 5. For C3A, C3GC, and C_{60}@[6]CPPA, a Normal LNOCCSD(T)/CBS(Q,5)based BSI correction (Δ_{BSI}) was added to the Tight–very Tight extrapolated LNOCCSD(T)/augccpVTZ interaction energies, exploiting the parallel convergence of the LNOCCSD(T) energies for these basis sets^{13}. Error bars accompanying the LNOCCSD(T) interaction energies of Fig. 3 and Table 1, and determining the interval enclosed by the dashed lines on panels (a) and (b) of Fig. 4 are the sums of the BSI and local error estimates. The BSI error measure is the maximum of two separate error estimates: the difference between CPcorrected and uncorrected CBS(Q,5) energies, and the difference between CPCBS(T,Q) and CPCBS(Q,5) results. This BSI error bar is increased with an additional term if Δ_{BSI} is employed according to Supplementary Note 1 A. Local error bars shown, e.g., on panel (b) of Fig. 4 are obtained via the extrapolation scheme of ref. ^{13}. Explicitly, the local error bar of the best estimated CCSD(T) results (see Table S I) is calculated from the difference of the Tight and very Tight LNOCCSD(T) results evaluated with the largest possible basis sets^{13}.
Computational details for FNDMC
Our FNDMC calculations use the SlaterJastrow ansatz with the single Slater determinants obtained from DFT. The Jastrow factor for each system contains explicit electronelectron, electronnucleus, and threebody electronelectronnucleus terms. The parameters of the Jastrow factor were optimized for each complex using the variational Monte Carlo (VMC) method and the varmin algorithm which allows for systematic improvement of the trial wavefunction, as implemented in CASINO v2.13.610^{95}. Note that bound complexes were used in the VMC optimizations and the resulting Jastrow factor was used to compute the corresponding fragments. All systems were treated in realspace as nonperiodic open systems in VMC and FNDMC.
We performed FNDMC simulations using the sizeconsistent ZSGMA algorithm^{40}. Trail and Needs pseudopotentials^{78,79} were used for all elements with the locality approximation (LA) for the nonlocal pseudopotentials^{39} and 0.03 a.u. timestep for all L7 complexes. Smaller timesteps of 0.003 and 0.01 a.u. were also used to compute the interaction energy of the C2C2PD complex and the interaction energy was found to be in agreement within the stochastic error bars with all three timesteps.
The C_{60}@[6]CPPA complex exhibited numerical instability using the standard LA. This prevented sufficient statistical sampling and therefore we computed this complex with two alternative and more numerically stable approaches. First, the energy reported in Fig. 3 and Table 1 is using the recently developed determinant localization approximation (DLA)^{36} implementation CASINO v2.13.809^{95}. The DLA gives: (i) better numerical stability than the LA algorithm allowing for more statistics to be accumulated, (ii) smaller dependence on the Jastrow factor, and (iii) addresses an indirect issue related to the use of nonlocal pseudopotentials. Second, the Tmove approximation^{38} (without DLA) was was also applied to C_{60}@[6]CPPA for comparison. The Tmove scheme is more numerically stable than the standard LA algorithm but is also more timestep dependent and therefore we used results from 0.01 and 0.02 a.u. timesteps to extrapolate the interaction energy to the zero timestep limit, as reported in SM. The extrapolated interaction energy with the Tmove scheme is − 31.14 ± 2.57 kcal mol^{−1} using LDA nodal structure and − 29.16 ± 2.33 kcal mol^{−1} using PBE0 nodal structure. Due to the large stochastic error on these results, we report the better converged DLAbased interaction energy (with PBE0 nodal structure) in the main results, but we note that all three predictions from FNDMC agree within the statistical error bars. Furthermore, as the DLA is less sensitive to the Jastrow factor at finite timesteps, we have also tested the interaction energies of GGG, C3A, and C2C2PD complexes, finding agreement with the LAbased FNDMC results within one standard deviation. Further details can be found in the SM.
The initial DFT orbitals (which define the nodal structure in FNDMC) were prepared using PWSCF in Quantum Espresso v.6.1^{96} with a planewave energy cutoff of 500 Ry. The planewave representation of the molecular orbitals from PWSCF were expanded in terms of Bsplines. Since PWSCF uses periodic boundary conditions, all complexes were centered in an orthorhombic unit cell with a vacuum spacing of ~ 8 Å in each Cartesian direction to ensure that the singleparticle orbitals are fully enclosed. LDA orbitals were used for L7 complexes and in addition, PBE0 orbitals were also considered for C2C2PD, C3GC, and C_{60}@[6]CPPA. In all cases, the final FNDMC interaction energy from LDA and PBE0 nodal structures are in agreement within the stochastic errors.
FNDMC evaluations of the interaction energy in nine complexes in the S66 set, entries from 24 to 29 and from 47 to 49, were performed with a similar setup. We used the latest version of the Trail and Needs pseudopotentials^{97}, and we employed the DLA approximation. LDA orbitals were used for the wave function ansatz, but PBE and PBE0 orbitals were also tested on the benzene dimer (see SM).
Data availability
The data supporting the findings of this study are available within the paper and its supplementary material. Primary numerical data, e.g., CCSD(T) or FNDMC energies of molecules are available from PRN and YSA upon reasonable request.
References
Carter, E. A. Challenges in Modeling Materials Properties Without Experimental Input. Science 321, 800–803 (2008).
Dubecký, M. et al. Quantum Monte Carlo methods describe noncovalent interactions with subchemical accuracy. J. Chem. Theor. Comput. 9, 4287–4292 (2013).
Yang, J. et al. Ab initio determination of the crystalline benzene lattice energy to subkilojoule/mole accuracy. Science 345, 640–643 (2014).
Reilly, A. M. et al. Report on the sixth blind test of organic crystal structure prediction methods. Acta Crystallogr. Sec. B Struct. Sci. Crystal Eng. Mater. 72, 439–459 (2016).
MüllerDethlefs, K. & Hobza, P. Noncovalent Interactions: a Challenge for Experiment and Theory. Chem. Rev. 100, 143–168 (2000).
Wang, Y. et al. Accelerating the discovery of insensitive highenergydensity materials by a materials genome approach. Nat. Commun. 9, 2444 (2018).
Lee, Y. et al. HighThroughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: application to Zeolites. J. Chem. Theor. Comput. 14, 4427–4437 (2018).
Ongari, D., Yakutovich, A. V., Talirz, L. & Smit, B. Building a Consistent and Reproducible Database for Adsorption Evaluation in CovalentOrganic Frameworks. ACS Cent. Sci. 5, 1663–1675 (2019).
AlHamdani, Y. S. & Tkatchenko, A. Understanding noncovalent interactions in larger molecular complexes from first principles. J. Chem. Phys. 150, 010901 (2019).
Liao, K., Li, X.Z., Alavi, A. & Grüneis, A. A comparative study using stateoftheart electronic structure theories on solid hydrogen phases under high pressures. npj Comput. Mater. 5, 110 (2019).
Raghavachari, K., Trucks, G. W., Pople, J. A. & HeadGordon, M. A fifthorder perturbation comparison of electron correlation theories. Chem. Phys. Lett. 157, 479–483 (1989).
Nagy, P. R., Samu, G. & Kállay, M. Optimization of the linearscaling local natural orbital CCSD(T) method: improved algorithm and benchmark applications. J. Chem. Theory Comput. 14, 4193 (2018).
Nagy, P. R. & Kállay, M. Approaching the basis set limit of CCSD(T) energies for large molecules with local natural orbital coupledcluster methods. J. Chem. Theory Comput. 15, 5275–5298 (2019).
Shavitt, I. & Bartlett, R. ManyBody Methods in Chemistry and Physics: MBPT and CoupledCluster Theory. Cambridge Molecular Science (Cambridge University Press, 2009).
Foulkes, W. M. C., Mitas, L., Needs, R. J. & Rajagopal, G. Quantum Monte Carlo simulations of solids. Rev. Mod. Phys. 73, 33–83 (2001).
Dubecký, M., Mitas, L. & Jurečka, P. Noncovalent Interactions by Quantum Monte Carlo. Chem. Rev. 116, 5188–5215 (2016).
Mostaani, E., Drummond, N. D. & Fal’ko, V. I. Quantum Monte Carlo Calculation of the Binding Energy of Bilayer Graphene. Phys. Rev. Lett. 115, 115501 (2015).
Santra, B. et al. Hydrogen bonds and van der waals forces in ice at ambient and high pressures. Phys. Rev. Lett. 107, 185701 (2011).
Zen, A. et al. Fast and accurate quantum Monte Carlo for molecular crystals. Proc. Natl. Acad. Sci. 115, 1724–1729 (2018).
AlHamdani, Y. S., Alfè, D. & Michaelides, A. How strongly do hydrogen and water molecules stick to carbon nanomaterials? J. Chem. Phys. 146, 094701 (2017).
Brandenburg, J. G. et al. Physisorption of water on graphene: subchemical accuracy from manybody electronic structure methods. J. Phys. Chem. Lett. 10, 358–368 (2019).
Zen, A. et al. Toward accurate adsorption energetics on clay surfaces. J. Phys. Chem. C 120, 26402–26413 (2016).
Ambrosetti, A., Ferri, N., DiStasio Jr., R. A. & Tkatchenko, A. Wavelike charge density fluctuations and van der waals interactions at the nanoscale. Science 351, 1171–1176 (2016).
Jordan, K. D. & Heßelmann, A. Comment on "Physisorption of Water on Graphene: subchemical Accuracy from ManyBody Electronic Structure Methods. J. Phys. Chem. C 123, 10163–10165 (2019).
Jenness, G. R., Karalti, O. & Jordan, K. D. Benchmark calculations of wateracene interaction energies: extrapolation to the watergraphene limit and assessment of dispersioncorrected DFT methods. Phys. Chem. Chem. Phys. 12, 6375–6381 (2010).
Nguyen, B. D. et al. Divergence of manybody perturbation theory for noncovalent interactions of large molecules. J. Chem. Theory Comput. 16, 2258–2273 (2020).
Rezáč, J., Riley, K. E. & Hobza, P. S66: a wellbalanced database of benchmark interaction energies relevant to biomolecular structures. J. Chem. Theory Comput. 7, 2427–2438 (2011).
Sedlak, R. et al. Accuracy of quantum chemical methods for large noncovalent complexes. J. Chem. Theory Comput. 9, 3364–3374 (2013).
Antoine, R. et al. Direct measurement of the electric polarizability of isolated C60 molecules. J. Chem. Phys. 110, 9771–9772 (1999).
Sadhukhan, M. & Tkatchenko, A. Longrange repulsion between spatially confined van der waals dimers. Phys. Rev. Lett. 118, 210402 (2017).
Stöhr, M., Sadhukhan, M., AlHamdani, Y. S., Hermann, J. & Tkatchenko, A. Coulomb interactions between dipolar quantum fluctuations in van der Waals bound molecules and materials. Nat. Commun. 12, 137 (2021).
Caldeweyher, E. et al. A generally applicable atomiccharge dependent london dispersion correction. J. Chem. Phys. 150, 154122 (2019).
Ambrosetti, A., Reilly, A. M., DiStasio Jr., R. A. & Tkatchenko, A. Longrange correlation energy calculated from coupled atomic response functions. J. Chem. Phys. 140, 18A508 (2014).
Hermann, J., DiStasio, R. A. & Tkatchenko, A. FirstPrinciples Models for van der Waals Interactions in Molecules and Materials: Concepts, Theory, and Applications. Chem. Rev. 117, 4714–4758 (2017).
Grimme, S., Hansen, A., Brandenburg, J. G. & Bannwarth, C. DispersionCorrected MeanField Electronic Structure Methods. Chem. Rev. 116, 5105–5154 (2016).
Zen, A., Brandenburg, J. G., Michaelides, A. & Alfè, D. A new scheme for fixed node diffusion quantum Monte Carlo with pseudopotentials: Improving reproducibility and reducing the trialwavefunction bias. J. Chem. Phys. 151, 134105 (2019).
Casula, M., Moroni, S., Sorella, S. & Filippi, C. Sizeconsistent variational approaches to nonlocal pseudopotentials: standard and lattice regularized diffusion Monte Carlo methods revisited. J. Chem. Phys. 132, 154113 (2010).
Casula, M. Beyond the locality approximation in the standard diffusion Monte Carlo method. Phys. Rev. B 74, 161102 (2006).
Mitáš, L., Shirley, E. L. & Ceperley, D. M. Nonlocal pseudopotentials and diffusion Monte Carlo. J. Chem. Phys. 95, 3467–3475 (1991).
Zen, A., Sorella, S., Gillan, M. J., Michaelides, A. & Alfè, D. Boosting the accuracy and speed of quantum Monte Carlo: size consistency and time step. Phys. Rev. B 93, 241118 (2016).
Umrigar, C. J., Nightingale, M. P. & Runge, K. J. A diffusion Monte Carlo algorithm with very small timestep errors. J. Chem. Phys. 99, 2865–2890 (1993).
Needs, R. J., Towler, M. D., Drummond, N. D., López Ríos, P. & Trail, J. R. Variational and diffusion quantum monte carlo calculations with the casino code. J. Chem. Phys. 152, 154106 (2020).
Kim, J. et al. QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids. J. Phys.: Condens. Matte 30 195901 (2018).
Nakano, K. et al. TurboRVB: a manybody toolkit for ab initio electronic simulations by quantum Monte Carlo. J. Chem. Phys. 152, 204121 (2020).
AlHamdani, Y. S. et al. Properties of the water to boron nitride interaction: from zero to two dimensions with benchmark accuracy. J. Chem. Phys. 147, 044710 (2017).
Booth, G. H., Grüneis, A., Kresse, G. & Alavi, A. Towards an exact description of electronic wavefunctions in real solids. Nature 493, 365–370 (2013).
Kottmann, J. S. & Bischoff, F. A. CoupledCluster in real space. 1. CC2 ground state energies using multiresolution analysis. J. Chem. Theory Comput. 13, 5945–5955 (2017).
Riplinger, C., Sandhoefer, B., Hansen, A. & Neese, F. Natural triple excitations in local coupled cluster calculations with pair natural orbitals. J. Chem. Phys. 139, 134101 (2013).
Ma, Q. & Werner, H.J. Explicitly correlated local coupledcluster methods using pair natural orbitals. Wiley Interdiscip. Rev. Comput. Mol. Sci. 8, e1371 (2018).
Schmitz, G., Hattig, C. & Tew, D. P. Explicitly correlated PNOMP2 and PNOCCSD and their application to the S66 set and large molecular systems. Phys. Chem. Chem. Phys. 16, 22167–22178 (2014).
Pavošević, F. et al. Sparsemaps—A systematic infrastructure for reduced scaling electronic structure methods. V. Linear scaling explicitly correlated coupledcluster method with pair natural orbitals. J. Chem. Phys. 146, 174108 (2017).
Ballesteros, F., Dunivan, S. & Lao, K. U. Coupled cluster benchmarks of large noncovalent complexes: the L7 dataset as well as DNA–ellipticine and buckycatcher–fullerene. J. Chem. Phys. 154, 154104 (2021).
Benali, A., Shin, H. & Heinonen, O. Quantum Monte Carlo benchmarking of large noncovalent complexes in the L7 benchmark set. J. Chem. Phys. 153, 194113 (2020).
Deible, M. J., Kessler, M., Gasperich, K. E. & Jordan, K. D. Quantum Monte Carlo calculation of the binding energy of the beryllium dimer. J. Chem. Phys. 143, 084116 (2015).
Flöser, B. M., Guo, Y., Riplinger, C., Tuczek, F. & Neese, F. Detailed pair natural orbitalbased coupled cluster studies of spin crossover energetics. J. Chem. Theory Comput. 16, 2224–2235 (2020).
Ajala, A. O., Voora, V., Mardirossian, N., Furche, F. & Paesani, F. Assessment of Density Functional Theory in Predicting Interaction Energies between Water and Polycyclic Aromatic Hydrocarbons: from Water on Benzene to Water on Graphene. J. Chem. Theor. Comput. 15, 2359–2374 (2019).
Řezáč, J., Dubecký, M., Jurečka, P. & Hobza, P. Extensions and applications of the A24 data set of accurate interaction energies. Phys. Chem. Chem. Phys. 17, 19268 (2015).
Tsatsoulis, T. et al. A comparison between quantum chemistry and quantum monte carlo techniques for the adsorption of water on the (001) lih surface. J. Chem. Phys. 146, 204108 (2017).
Kesharwani, M. K., Karton, A., Sylvetsky, N. & Martin, J. M. L. The S66 noncovalent interactions benchmark reconsidered using explicitly correlated methods near the basis set limit. Aust. J. Chem. 71, 238 (2018).
Azadi, S. & Cohen, R. E. Chemical accuracy from quantum Monte Carlo for the benzene dimer. J. Chem. Phys. 143, 104301 (2015).
Sorella, S., Casula, M. & Rocca, D. Weak binding between two aromatic rings: feeling the van der Waals attraction by quantum Monte Carlo methods. J. Chem. Phys. 127, 14105 (2007).
Gasperich, K. & Jordan, K. D. Diffusion Monte Carlo Study of the Parallel Displaced Form of the Benzene Dimer. In Recent Progress in Quantum Monte Carlo, vol. 1234 of ACS Symposium Series, 107–117 (American Chemical Society, 2016).
Grimme, S. Supramolecular binding thermodynamics by dispersioncorrected density functional theory. Chem.  Eur. J. 18, 9955–9964 (2012).
Sure, R. & Grimme, S. Comprehensive Benchmark of Association (Free) Energies of Realistic HostGuest Complexes. J. Chem. Theory Comput. 11, 3785–3801 (2015).
Hermann, J., Alfè, D. & Tkatchenko, A. Nanoscale ππ Stacked molecules are bound by collective charge fluctuations. Nat. Commun. 8, 14052 (2017).
Calbo, J., Ortí, E., SanchoGarcía, J. C. & Aragó, J. Accurate treatment of large supramolecular complexes by doublehybrid density functionals coupled with nonlocal van der waals corrections. J. Chem. Theor. Comput. 11, 932–939 (2015).
Christensen, A. S., Elstner, M. & Cui, Q. Improving intermolecular interactions in DFTB3 using extended polarization from chemicalpotential equalization. J. Chem. Phys. 143, 084123 (2015).
Brandenburg, J. G., Bannwarth, C., Hansen, A. & Grimme, S. B973c: a revised lowcost variant of the B97D density functional method. J. Chem. Phys. 148, 064104 (2018).
CarterFenk, K., Lao, K. U., Liu, K.Y. & Herbert, J. M. Accurate and efficient ab initio calculations for supramolecular complexes: Symmetryadapted perturbation theory with manybody dispersion. J. Phys. Chem. Lett. 10, 2706–2714 (2019).
Chen, J.L., Sun, T., Wang, Y.B. & Wang, W. Toward a less costly but accurate calculation of the CCSD(T)/CBS noncovalent interaction energy. J. Comput. Chem. 41, 1252 (2020).
Kendall, R. A., Dunning Jr., T. H. & Harrison, R. J. Electron affinities of the firstrow atoms revisited. Systematic basis sets and wave functions. J. Chem. Phys. 96, 6796 (1992).
Helgaker, T., Klopper, W., Koch, H. & Noga, J. Basisset convergence of correlated calculations on water. J. Chem. Phys. 106, 9639 (1997).
Boys, S. F. & Bernardi, F. The calculation of small molecular interactions by the differences of separate total energies. Some procedures with reduced errors. Mol. Phys. 19, 553–566 (1970).
Kállay, M. et al. The MRCC program system: accurate quantum chemistry from water to proteins. J. Chem. Phys. 152, 074107 (2020).
Morales, M. A., McMinis, J., Clark, B. K., Kim, J. & Scuseria, G. E. Multideterminant Wave Functions in Quantum Monte Carlo. J. Chem. Theor. Comput. 8, 2181–2188 (2012).
Scemama, A., Applencourt, T., Giner, E. & Caffarel, M. Quantum Monte Carlo with very large multideterminant wavefunctions. J. Comp. Chem. 37, 1866–1875 (2016).
Genovese, C., Meninno, A. & Sorella, S. Assessing the accuracy of the Jastrow antisymmetrized geminal power in the H 4 model system. J. Chem. Phys. 150, 084102 (2019).
Trail, J. R. & Needs, R. J. Smooth relativistic HartreeFock pseudopotentials for H to Ba and Lu to Hg. J. Chem. Phys. 122, 174109 (2005).
Trail, J. R. & Needs, R. J. Normconserving HartreeFock pseudopotentials and their asymptotic behavior. J. Chem. Phys. 122, 014112 (2005).
Pfau, D., Spencer, J. S., Matthews, A. G. D. G. & Foulkes, W. M. C. Ab initio solution of the manyelectron Schrödinger equation with deep neural networks. Phys. Rev. Res. 2, 033429 (2020).
Hermann, J., Schätzle, Z. & Noé, F. Deepneuralnetwork solution of the electronic Schrödinger equation. Nat. Chem. 12, 891–897 (2020).
Gordon, M. (ed.) Fragmentation: toward Accurate Calculations on Complex Molecular Systems (Wiley, 2017).
Piecuch, P., Kucharski, S. A., Kowalski, K. & Musial, M. Efficient computer implementation of the renormalized coupledcluster methods: The RCCSD[T], RCCSD(T), CRCCSD[T], and CRCCSD(T) approaches. Comput. Phys. Commun. 149, 71–96 (2002).
Kállay, M. & Gauss, J. Approximate treatment of higher excitations in coupledcluster theory. J. Chem. Phys. 123, 214105 (2005).
Gonthier, J. F. & HeadGordon, M. Assessing electronic structure methods for longrange threebody dispersion interactions: Analysis and calculations on wellseparated metal atom trimers. J. Chem. Theory Comput. 15, 4351–4361 (2019).
Frey, J. A., Holzer, C., Klopper, W. & Leutwyler, S. Experimental and theoretical determination of dissociation energies of dispersiondominated aromatic molecular complexes. Chem. Rev. 116, 5614–5641 (2016).
Kawase, T., Tanaka, K., Fujiwara, N., Darabi, H. R. & Oda, M. Complexation of a Carbon Nanoring with Fullerenes. Angew. Chem. Int. Ed. 42, 1624–1628 (2003).
Řezáč, J. et al. Quantum Chemical Benchmark Energy and Geometry Database for Molecular Clusters and Complex Molecular Systems (www.begdb.com): a Users Manual and Examples. Collect. Czech. Chem. Commun. 73, 1261–1270 (2008).
Iwamoto, T., Watanabe, Y., Sadahiro, T., Haino, T. & Yamago, S. Sizeselective encapsulation of C60 by [10]cycloparaphenylene: Formation of the shortest fullerenepeapod. Angew. Chem. Int. Ed 50, 8342–8344 (2011).
Antony, J., Sure, R. & Grimme, S. Using dispersioncorrected density functional theory to understand supramolecular binding thermodynamics. Chem. Commun. 51, 1764–1774 (2015).
Li, W., Piecuch, P., Gour, J. R. & Li, S. Local correlation calculations using standard and renormalized coupledcluster approaches. J. Chem. Phys. 131, 114109 (2009).
Friedrich, J., Coriani, S., Helgaker, T. & Dolg, M. Implementation of the incremental scheme for oneelectron firstorder properties in coupledcluster theory. J. Chem. Phys. 131, 154102 (2009).
Nagy, P. R. & Kállay, M. Optimization of the linearscaling local natural orbital CCSD(T) method: redundancyfree triples correction using Laplace transform. J. Chem. Phys. 146, 214106 (2017).
GyeviNagy, L., Kállay, M. & Nagy, P. R. Integraldirect and parallel implementation of the CCSD(T) method: algorithmic developments and largescale applications. J. Chem. Theory Comput. 16, 336–384 (2020).
Needs, R. J., Towler, M. D., Drummond, N. D. & López Ríos, P. Continuum variational and diffusion quantum Monte Carlo calculations. J. Phys. Condens. Matter 22, 023201 (2010).
Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and opensource software project for quantum simulations of materials. J. Phys. Condens. Matter 21, 395502 (2009).
Trail, J. R. & Needs, R. J. Shape and energy consistent pseudopotentials for correlated electron systems. J. Chem. Phys. 146, 204107 (2017).
Acknowledgements
We thank HPC staff for their support and access to the IRIS cluster at the University of Luxembourg and to the DECI resource Saga based in Norway at Trondheim with support from the PRACE aisbl (NN9914K). Y.S.A. thanks funding from NIH grant number R01GM118697 and is supported by The National Centre of Competence in Research (NCCR) Materials Revolution: Computational Design and Discovery of Novel Materials (MARVEL) of the Swiss National Science Foundation (SNSF). The work of P.R.N. is supported by the ÚNKP194 and ÚNKP205 New National Excellence Program of the Ministry for Innovation and Technology and the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. J.G.B. acknowledges support from the Alexander von Humboldt foundation. M.K. is grateful for the financial support from the National Research, Development, and Innovation Office (NKFIH, Grant No. KKP126451) the NRDI Fund (TKP2020 IES, Grant No. BMEIEBIO) based on the charter of bolster issued by the NRDI Office under the auspices of the Ministry for Innovation and Technology. A.T. acknowledges financial support from the European Research Council (ERCCoG grant BeStMo) and Fonds National de la Recherche Luxembourg (FNR Grant INTER/DFG/18/12944860). A.Z. acknowledges financial support from the Leverhulme Trust, grant number RPG2020038. Calculations were also performed on the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/P020259/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk).
Author information
Authors and Affiliations
Contributions
Y.S.A. and P.R.N. contributed equally to this work. Major investigation was conducted by P.R.N. (performing coupled cluster calculations) and Y.S.A. (performing quantum Monte Carlo calculations). J.G.B. performed PBE0+D4 calculations and supporting validation calculations. D.B. performed PBE0+MBD calculations. A.Z. performed QMC calculations on the S66 complexes. The work has been conceptualized by Y.S.A. and A.T. with additional contribution from J.G.B., P.R.N. and A.Z. Software development to expand the application of LNOCCSD(T) method in this work has been conducted by P.R.N. and M.K. P.R.N. performed formal analysis to obtain uncertainty estimates from LNOCCSD(T) data. The original draft of the paper was written by Y.S.A. and P.R.N. Additional review and editing of the paper was undertaken by J.G.B., A.T., M.K. and A.Z. Project administration was led by Y.S.A. with contribution from J.G.B. and A.T. J.G.B. and A.T. supervised the work.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Roberto Peverati and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
AlHamdani, Y.S., Nagy, P.R., Zen, A. et al. Interactions between large molecules pose a puzzle for reference quantum mechanical methods. Nat Commun 12, 3927 (2021). https://doi.org/10.1038/s41467021241193
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467021241193
This article is cited by

Towards the ground state of molecules via diffusion Monte Carlo on neural networks
Nature Communications (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.