Interactions between large molecules pose a puzzle for reference quantum mechanical methods

Quantum-mechanical methods are used for understanding molecular interactions throughout the natural sciences. Quantum diffusion Monte Carlo (DMC) and coupled cluster with single, double, and perturbative triple excitations [CCSD(T)] are state-of-the-art trusted wavefunction methods that have been shown to yield accurate interaction energies for small organic molecules. These methods provide valuable reference information for widely-used semi-empirical and machine learning potentials, especially where experimental information is scarce. However, agreement for systems beyond small molecules is a crucial remaining milestone for cementing the benchmark accuracy of these methods. We show that CCSD(T) and DMC interaction energies are not consistent for a set of polarizable supramolecules. Whilst there is agreement for some of the complexes, in a few key systems disagreements of up to 8 kcal mol−1 remain. These findings thus indicate that more caution is required when aiming at reproducible non-covalent interactions between extended molecules.

The crucial challenge lies in extensively accounting for relatively small fluctuations in the electron charge densities. In FN-DMC this implies the need for relatively small time-steps ∆τ for the projection of the wavefunction as well as an extensive sampling of electron configurations in real-space (lim τ →∞ ) in order to reduce the stochastic noise in the predicted energy. In coupled cluster theory, non-covalent interactions require a high-order treatment of many-electron processes, as is included in CCSD(T), and a sufficiently large single-particle basis set. Reaching basis set saturation and well-controlled local approximations concurrently for the studied systems required previously unfeasible computational efforts as shown by the several kcal mol −1 scatter of interaction energy predictions reported for the L7 set (see Fig. S3). Our recent efforts enabled the following: (i) a systematically converging series of local CCSD(T) results is presented for highly-complicated complexes, (ii) both the local and the basis set incompleteness (BSI) errors are closely monitored using comprehensive uncertainty measures [14], (iii) convergence up to chemical accuracy is reached for the complete L7 set concurrently in the local approximations as well as in the basis set saturation.
The benefit of such demanding convergence studies is that the resulting interaction energies, up to the respective error bars, can be considered independent of the corresponding approximations. Consequently, we expect that the CBS limit of the exact CCSD(T) results could, in principle, be approached similarly using alternative basis sets [10,30,31] or local correlation methods [32][33][34][35], as it is clearly observed for some of the present complexes (see, e.g. GGG or CBH in Fig. S3).
We use highly-optimized algorithms both for FN-DMC and CCSD(T) as outlined in Methods, and push them beyond the typically applied limits. We used circa 0.7 and 1 million CPU core hours for FN-DMC and CCSD(T), respectively. This is equivalent to running a modern 28 core machine constantly for ∼ 7 years.

Losing consensus on supramolecular interactions
Demonstrating agreement between fundamentally different electronic structure methods for solving the Schrödinger equation provides a proof-of-principle for the accuracy of the methods beyond technical challenges. To date, disagreements between CCSD(T) and FN-DMC have been reported only for systems where their key assumptions, e.g. single-reference wavefunction, accurate node-structure, etc. were not completely fulfilled [36,37]. Previously however, CCSD(T) and FN-DMC were found in agreement within the error bars, for the interaction energies of small organic molecules with pure dynamic correlation [9,16,38] as well as some extended systems [11,17,39]. Establishing this agreement for systems at the 100 atom range has, however, been hindered by the sizable or unavailable error estimates for finite systems [9]. For example, binding energies of large host-guest complexes derived from experimental association free energies [40,41] motivated previous FN-DMC [42] as well as local CCSD(T) [43] computations. However, conclusive remarks could not be made on the consistency of FN-DMC and local CCSD(T) on these complexes due to technical difficulties and unavailable uncertainty estimates for local CCSD(T), and large error estimates on both experimental and FN-DMC energies (up to a few kcal mol −1 ).
Here, we consider similar but somewhat smaller supramolecular complexes (Fig. S2) and obtain tightly converged local CCSD(T) and FN-DMC results sufficient for rigorous comparisons (see Fig. S3 and Table SI). The level of uncertainty in our results is indicated by stochastic error bars for FN-DMC and the sum of local and BSI error estimates for CCSD(T).
The complexes are arranged in Fig. S3 according to increasing interaction strength, which roughly scales with the size of the interacting surface. CCSD(T) and FN-DMC agree on the interaction energy to within 0.5 kcal mol −1 , taking error bars into account, for a subset of the complexes we consider: GGG, CBH, GCGC, C3A and PHE. These complexes are between 48 and 112 atoms in size and exhibit π − π stacking, hydrogen bonding, and dispersion interactions. Therefore, the agreement for these five complexes indicates their absolute interaction energies are established references and can be used to benchmark other methods for large molecules. Here, relative differences of very small interaction energies have to be interpreted carefully as they are sensitive to the uncertainty estimates. In GGG for example, ∆ min is 0.1 kcal mol −1 whilst the relative disagreement lies between 3% and 50%.
In contrast, the relative disagreement between FN-DMC and CCSD(T) is better resolved in the more strongly interacting C 60 @ [6]CPPA complex, at 20-31%.   A salient and surprising finding is the disagreement between state-of-the-art methods on the interaction energy of three non-trivial complexes: coronene dimer (C2C2PD), circumcoronene-GC base pair (C3GC), and buckyball-ring (C 60 @[6]CPPA). Considering the error bars, the minimum differences (∆ min ), as indicated in Table SI  C2C2PD has attracted the most attention to date in the CCSD(T) context as it represents a stepping stone between two widely studied systems: benzene dimer and graphene bilayer [9]. Already C2C2PD has posed a significant challenge to various local CCSD(T) methods due to its slowly-decaying long-range interactions [14,33,35,[44][45][46][47]. Considerable efforts have been devoted recently [14,33,35] [23], and various local CCSD (T) approaches [33,35,[44][45][46][47]]. The yellow bars indicate the delta value (∆ min ) which is the minimum difference between best converged CCSD(T) and FN-DMC, given by the estimated and stochastic error bars, respectively.

Distinct errors using DNA base molecules on circumcoronene
The C3GC and C3A complexes are ideal for assessing the convergence of CCSD (T) and FN-DMC, due to their chemical similarity and importance of π − π stacking interactions, i.e. nucleobases stacked on circumcoronene. CCSD(T) and FN-DMC agree within 1 kcal mol −1 for the interaction energy of C3A, whereas there is a notable disagreement of at least 2.9 kcal mol −1 in the interaction energy of C3GC. Interestingly, both systems involve similar interaction mechanisms, with C3GC exhibiting both stacking and hydrogen-bonding interactions.
CCSD(T) and FN-DMC interaction energies involve multiple approximations. Known sources of error to consider in our FN-DMC calculations are: • The fixed-node approximation which restricts the nodal-structure to that of the input guiding wavefunction.
• Time-step bias from the discretization of imaginary time for propagating the wavefunction.
• Pseudopotentials to approximate core electrons for each atom.
• Non-uniform quality of optimized trial wavefunctions for fragments and bound complex at larger time-steps.
In obtaining CCSD(T) interaction energies, the sources of error are: • Local approximations of long-range electron correlation according to the LNO scheme.
• Single-particle basis representation of the CCSD(T) wavefunction.
In Fig. S4 we analyse the most critical approximations for each method on the example of the C3A and C3GC complexes, and we also consider the other remaining known sources of error in Methods.
For the single-particle basis representation in CCSD(T) we employed conventional correlation-consistent basis sets augmented with diffuse functions [48], aug-cc-pVXZ (X=T, Q, and 5) as shown in panel a) of Fig. S4. The remaining BSI is alleviated using extrapolation [49] toward the complete basis set (CBS) limit [CBS(X,X + 1), X=T, Q], and counterpoise (CP) corrections [50]. The local errors decrease systematically as the LNO threshold sets are tightened (Normal, Tight, very Tight) enabling extrapolations, e.g. Normal-Tight (N-T), to estimate the canonical CCSD(T) interaction energy [14] (see panel b) of Fig. S4).
Exploiting the systematic convergence properties, an upper bound for both the local and the BSI errors can be given [51].
Benchmarks presented previously for energy differences of a broad variety of systems showed excellent overall accuracy at the Normal-Tight extrapolated LNO-CCSD (T)/CBS(T,Q) level (M1) [14]. However, the BSI error bar of 1.0 kcal mol −1 and the local error bar of FN-DMC has the advantage that the wavefunction is sampled in real-space without the need for the numerical representation of many-particle basis states thus reducing sensitivity to the single-particle basis set. Instead, pertinent sources of error in FN-DMC which we assess in Fig. S4 are the effects of the fixed-node approximation and the time-step bias.
Note that these sources of error are not included in the FN-DMC stochastic error bars.
First, the different nodal surfaces from these DFT methods serve to indicate the dependence of the FN-DMC interaction energy on the nodal structure. Indeed, from Fig. S4, we find no indication that the FN-DMC interaction energies of C3GC is affected by the nodal structure outside of the stochastic error bars. Second, FN-DMC energies are sensitive to the time-step and we rely on recent improvements in FN-DMC algorithms [53,54], that enable convergence of time-steps as large as 0.05 a.u. We used 0.03 a.u. and 0.01 a.u. time-steps to compute the interaction energies of C3A and C3GC. Fig. S4 indicates that the interaction energy is unchanged for C3A and C3GC (within the stochastic error) for the different time-steps considered here. The time-step and fixed-node approximations perform similarly well for the coronene dimer and the buckyball-ring complex (see Section S2 of the SM).
Open challenges for next generation of many-body methods organic dimers [9,16,38], molecular crystals [17], and small physisorbed molecules on surfaces [11,39]. Indeed, we also find good agreement in the absolute interaction energies for five of the eight complexes considered here. However, we find that the disagreement by several kcal mol −1 in C 60 @[6]CPPA particularly, cannot be explained by the controllable sources of error. While both methods are highly sophisticated, they are still approximations to the exact solution of the many-electron Schrödinger equation. Moreover, there can be non-trivial coupling between approximations within each method, which remain poorly understood for complex many-electron wavefunctions. Here, we estimate the magnitude of additional approximations which are generally regarded as even more robust and contemplate potential strategies for improvements.

Are we there yet with FN-DMC?
The reported interaction energies of C2C2PD, C3GC, and C 60 @[6]CPPA indicate that FN-DMC stabilizes the interacting complexes more weakly than CCSD(T). Therefore, one possibility for the discrepancy between the methods is that FN-DMC (as applied here) does not capture the correlation energy in the bound complexes sufficiently. Reasons for this can include the fixed-node approximation and more generally, insufficient flexibility in the wavefunction ansatz.
The Slater-Jastrow ansatz was applied here using a single determinant combined with a Jastrow factor containing explicit parameterizable functions to describe electron-electron, electron-nucleus, and electron-electron-nucleus interactions. We have evaluated FN-DMC interaction energies for different nodal structures for C3GC, C2C2PD, C 60 @ [6]CPPA and in all cases the FN-DMC interaction energies are in 1-σ agreement (see Section S2 of SM) with stochastic errors that are mostly under 1 kcal mol −1 . Among these systems, the largest potential deviation (∆ max ) due to the fixed-node error is estimated to be ∼3.7 kcal mol −1 in C 60 @[6]CPPA. Although this potentially large source of error is not enough to explain the 8.3 kcal mol −1 ∆ min disagreement with CCSD(T), it remains a pertinent issue for establishing chemical accuracy. Reducing the fixed-node error, for example by using more than one Slater determinant to systematically improve the nodal structure, in such large molecules remains challenging [55,56]. Promising alternatives include the Jastrow antisymmetrized geminal power approach which has recently been shown to recover near-exact results for a small, strongly correlated cluster of hydrogen atoms [57].
The Jastrow factor is a convenient approach to increase the efficiency of FN-DMC since in the zero time-step limit and with sufficient sampling, the FN-DMC energy is independent of this term. However, the quality of the Jastrow factor can be non-uniform for the bound complex and the non-interacting fragments, which can introduce a bias at larger time-steps. The recent DLA method in FN-DMC reduces this effect [54] and was applied to the C 60 @[6]CPPA complex reported in Table SI and also tested for GGG, C3A, and We estimate the error from the use of Trail and Needs pseudopotentials [58,59] in FN-DMC at the Hartree-Fock (HF) level using interaction energy of C2C2PD. We find 0.1 kcal mol −1 difference in the HF interaction energy with the employed pseudopotentials and without (i.e. all-electron) which is well within the acceptable uncertainty for our findings.
In addition, Zen et al. [17] have previously compared Trail and Needs pseudopotentials with correlated electron pseudopotentials [60] at the FN-DMC level using the binding energy of an ammonia dimer and found agreement within 0.1 kcal mol −1 .
In principle, a more flexible wavefunction ansatz allows a more accurate many-body wavefunction to be reached in DMC, thus recovering electron correlation more effectively.
To this end, recently introduced machine learning approaches [61,62] are promising but more expensive due to the considerable increase in parameters. However once feasible, a systematic assessment of the amount of electron correlation recovered by these different ansatze in non-covalently bound systems will bring valuable insight to the current puzzle.

Potential avenues for improvement upon CCSD(T)
Considering the complexes exhibiting significant π-π interactions, CCSD(T) is found to predict stronger interaction than FN-DMC. As some of the individually negligible but collectively important long-range interactions are estimated in local CC methods, these potentially overestimated interaction energy contributions could benefit from a higher-level theoretical description [33,63]. In the case of the LNO scheme, the majority of the local approximations have marginal effect on the interaction energies when very Tight settings are employed [14]. For instance, long-range interactions that do not benefit from the full CCSD(T) treatment add up to at most 2.9 kcal mol −1 for the interaction energy of The employed single-particle basis sets perform exceptionally well for CCSD(T) computations of small molecules [48,49], but approaching the CBS limit of CCSD(T) for large systems is mostly an uncharted territory in the literature [14,33]. The agreement of CP corrected CBS(T,Q), CBS(Q,5), and uncorrected CBS(Q,5) within 0.06-0.36 kcal mol −1 is highly satisfactory (see Sect. S1 B of SM). Furthermore, the CBS(5,6) results obtained with the aug-cc-pV6Z basis set for GGG are fully consistent with the CBS(Q,5) interaction energies (see Sect. S1 B of SM). CC methods exploiting explicitly correlated wavefunction forms [33,35] as well as alternatives to the conventional Gaussian basis sets [10,30,31] have emerged recently, which could provide independent verification of the systematic convergence studies performed here.
The higher-order contribution of three-, four-, etc. electron processes on top of CCSD(T) are usually found to be negligible for weakly-correlated molecules [38]. However, the available numerical experience is limited to complexes below about a dozen atoms, and for some highly polarizable systems the beyond CCSD(T) treatment of three-electron processes has been shown to contribute significantly to three-body dispersion [64]. The effect of core correlation is expected to be very small, thus we attempted to evaluate it independently from the numerical noise of the other approximations. For instance, the all electron interaction energy of C2C2PD is even stronger than the frozen core one at second-order by 0.2 kcal mol −1 (4.6 cal mol −1 per C atom). All in all, core and higherorder correlation effects appear to strengthen the CCSD(T) interaction energies and slightly increase the deviation compared to FN-DMC.

Insights from experiments and density-functional approximations
Experimental binding energies or association constants of supramolecular complexes are particularly valuable, when available, but also have their limitations as back-corrections are needed to separate the effects of thermal fluctuations and solvent effects for example [66].
In the case of C 60 @[6]CPPA for example, the association constant is measured in a benzene solution and indicates a stable encapsulated complex, but one which could not be well-characterised by X-ray crystallography; purportedly due to the rapid rotation of the buckyball guest [67]. Instead, a non-fully encapsulated structure was successfully characterized using toluene anchors on the buckyball. This demonstrates a number of physical leaps that exist between what can be measured and what can be accurately computed.
Other high-level methods, such as the full configuration interaction quantum Monte-Carlo (FCI-QMC) method [10,30], can be key to assessing the shortcomings from major approximations such as the fixed node approximation and static correlation. Once the severe scaling with system size associated with FCI-QMC and similar methods is addressed, larger molecules will become feasible. However, in the present time the lack of references in large systems remains a salient problem.
The scarcity of reference information has an impact on all other modelling methods, including density-functional approximations (DFAs), semi-empirical, force field or machine learning based models, etc. which are validated or parameterized based on higher-level benchmarks. In particular, there is a race to simulate larger, more anisotropic, and complex materials, accompanied by a difficulty of choice for modelling methods. To demonstrate the consequences of inconsistent references, Fig. S5 shows interaction energy discrepancies obtained with DFAs, PBE0+D4 [26] and PBE0+MBD [27], that are both designed to capture all orders of many-body dispersion interactions in different manner. Intriguingly, the PBE0+D4 method is in close agreement with CCSD(T) (mean absolute deviation, MAD=1.1 kcal mol −1 ), whereas PBE0+MBD is closer to FN-DMC (MAD=1.5 kcal mol −1 ) , but their performance is hard to characterize when CCSD(T) and FN-DMC disagree. Moreover, we decomposed the interaction energies from the DFAs into dispersion components and find that, for C 60 @[6]CPPA the main difference between PBE0+MBD and PBE0+D4 is 6.5 kcal mol −1 in the two-body dispersion contribution. Differences in beyond two-body dispersion interactions are smaller and at most 1.6 kcal mol −1 in C 60 @[6]CPPA.

DISCUSSION
Until now, disagreements between reference interaction energies of extended organic complexes have typically been ascribed to unconverged results due to practical bottlenecks.
Here, we report highly-converged results at the frontier of wavefunction based methods; uncovering a disconcerting level of disagreement in the interaction energy for three supramolec- The supramolecular complexes we report feature π − π stacking, hydrogen bonding, and intermolecular confinement, that are ubiquitous across natural and synthetic materials. Thus our immediate goals are to elucidate the sources of the underlying discrepancies and to explore the scope of systems where such deviations between reference wavefunction methods occur. Well-defined reference interaction energies and the better characterization of their predictive power have growing importance as they are frequently applied in chemistry, material, and biosciences. Our findings should motivate cooperative efforts between experts of computational and experimental methods in obtaining well-defined interaction energies and thereby extending the predictive power of first principles approaches across the board.

METHODS
The L7 structures have been defined by Sedlak et al. [23] and structures can be found on the begdb database [68]. Note that the interaction energy, E int , is defined with respect to two fragments even where the complex consists of more than two molecules (as in GGG, GCGC, PHE, and C3GC): where E com is the total energy of the full complex, and E 1 frag and E 2 frag are the total energies of isolated fragments 1 and 2, respectively. The fragment molecules have the same geometry as in the full complex, i.e. not relaxed. Further details on the configurations can be found in the SM and in Ref. 23.
The C 60 @[6]CPPA complex is based on similar complexes in previous theoretical and experimental works [42,69,70] and has been chosen to represent confined π-π interaction that are numerically still tractable by our methodologies. Its geometry has been symmetrized to D 3d point group, the individual fragments of C 60 and [6]CPPA are kept frozen (I h and D 6h , respectively). The structure is provided in the SM.

The local natural orbital CCSD(T) method
In order to reduce the N 7 -scaling of canonical CCSD(T) with respect to the system size (N ), the inverse sixth power decay of pairwise interactions can be exploited (local approximations) and the wavefunction can be compressed further via natural orbital (NO) techniques. [63] Building on such cost-reduction techniques a number of highly-efficient local CCSD(T) methods emerged in the past decade [13,14,[32][33][34][35]63]. As the local approximation free CCSD(T) energy can be approached by the simultaneous improvement of all local truncations in most of these techniques, in principle, all local CCSD(T) methods are expected to converge to the same interaction energy. Here we employ the local natural orbital CCSD(T) [LNO-CCSD(T)] scheme [13,71], which, for the studied systems, brings the feasibility of exceedingly well-converged CCSD(T) calculations in-line with FN-DMC. The approximations of the LNO scheme automatically adapt to the complexity of the underlying wavefunction and enable systematic convergence towards the exact CCSD(T) correlation energy, with up to 99.99% accuracy using sufficiently tight settings [14] .
The price of improvable accuracy is that the computational requirements can drastically increase depending on the nature of the wavefunction: while LNO-CCSD(T) has been successfully employed for macromolecules, such as small proteins at the 1000 atom range [13,14], sizable long-range interactions appearing in the here studied complexes pose a challenge for any local CCSD(T) method [14,32,33,35]. This motivated the implementation of several recent developments in our algorithm and computer code over the lifetime of this project, which cumulatively resulted in about 2-3 orders of magnitude decrease in the time-to-solution and data storage requirement of LNO-CCSD(T) [13,14,71], and made well-converged computations feasible for all complexes. For instance, we have designed a massively parallel conventional CCSD(T) code specifically for applications within the LNO scheme [72] and integrated it with our highly optimized LNO-CCSD(T) algorithms [13,14,71]. Here, we report the first large-scale LNO-CCSD(T) applications which exploit the resulted high performance capabilities using the most recent implementation of the Mrcc package [52] (release date February 22, 2020).

Computational details for CCSD(T)
The LNO-CCSD(T)-based CCSD(T)/CBS estimates were obtained as the average of CP-corrected and uncorrected ("half CP") [50], Tight-very Tight extrapolated LNO-CCSD(T)/CBS(Q,5) interaction energies [14]. Except for C3A, C3GC, and C 60 @[6]CPPA, the CBS(Q,5) notation refers to CBS extrapolation [49] using aug-cc-pVXZ basis sets [48] with X=Q and 5. For C3A, C3GC, and C 60 @[6]CPPA, a Normal LNO-CCSD(T)/CBS(Q,5)based BSI correction (∆ BSI ) was added to the Tight-very Tight extrapolated LNO-CCSD(T)/augcc-pVTZ interaction energies, exploiting the parallel convergence of the LNO-CCSD(T) energies for these basis sets [14]. Local error bars shown, e.g. on panel b) of Fig. S4 are obtained via the extrapolation scheme of Ref. [14]. Error bars accompanying the LNO-CCSD(T) interaction energies of Fig. S3 and Table SI, and determining the interval enclosed by the dashed lines on panels a) and b) of Fig. S4 are the sums of the BSI and local error estimates. The BSI error measure is the maximum of two separate error estimates: the difference between CP-corrected and uncorrected CBS(Q,5) energies, and the difference between CP-CBS(T,Q) and CP-CBS(Q,5) results. This BSI error bar is increased with an additional term if ∆ BSI is employed according to Sect. S1 B of the SM. The local error bar of the best estimated CCSD(T) results (see Table SII of the SM) is obtained from the difference of the Tight and very Tight LNO-CCSD(T) results evaluated with the largest possible basis sets [14].

Computational details for FN-DMC
Our FN-DMC calculations use the Slater-Jastrow ansatz with the single Slater determinants obtained from DFT. The Jastrow factor for each system contains explicit electronelectron, electron-nucleus, and three-body electron-electron-nucleus terms. The parameters of the Jastrow factor were optimized for each complex using the variational Monte Carlo (VMC) method and the varmin algorithm which allows for systematic improvement of the trial wavefunction, as implemented in CASINO v2.13.610 [73]. Note that bound complexes were used in the VMC optimizations and the resulting Jastrow factor was used to compute the corresponding fragments. All systems were treated in real-space as non-periodic open systems in VMC and FN-DMC.
We performed FN-DMC with the locality approximation (LA) for the non-local pseudopotentials [74] and 0.03 a.u. time-step for all L7 complexes. Smaller time-steps of 0.003 and 0.01 a.u. were also used to compute the interaction energy of the C2C2PD complex and the interaction energy was found to be in agreement within the stochastic error bars with all three time-steps.
The C 60 @[6]CPPA complex exhibited numerical instability using the standard LA. This prevented sufficient statistical sampling and therefore we computed this complex with two alternative and more numerically stable approaches. First, the energy reported in Fig. S3 and Table SI is using the recently developed determinant localization approximation (DLA) [54] implemented in CASINO v2.13.809 [73]. The DLA gives: (i) better numerical stability than the LA algorithm allowing for more statistics to be accumulated, (ii) smaller dependence on the Jastrow factor, and (iii) addresses an indirect issue related to the use of non-local pseudopotentials. Second, the T-move approximation [75] (without DLA) was also applied to C 60 @[6]CPPA for comparison. The T-move scheme is more numerically stable than the standard LA algorithm but is also more time-step dependent and therefore we used results from 0.01 and 0.02 a.u. time-steps to extrapolate the interaction energy to the zero time-step limit, as reported in SM. The extrapolated interaction energy with the T-move scheme is −31.14 ± 2.57 kcal mol −1 using LDA nodal structure and −29.16 ± 2.33 kcal mol −1 using PBE0 nodal structure. Due to the large stochastic error on these results, we report the better converged DLA-based interaction energy (with PBE0 nodal structure) in the main results, but we note that all three predictions from FN-DMC agree within the statistical error bars. Furthermore, as the DLA is less sensitive to the Jastrow factor at finite time-steps, we have also tested the interaction energies of GGG, C3A, and C2C2PD complexes, finding agreement with the LA-based FN-DMC results within one standard deviation. Further details can be found in the SM.
The initial DFT orbitals (which define the nodal structure in FN-DMC) were prepared using PWSCF in Quantum Espresso v.6.1 [76]. Trail and Needs pseudopotentials [58,59] were used for all elements, with a plane-wave energy cut-off of 500 Ry. The plane-wave representation of the molecular orbitals from PWSCF were expanded in terms of B-splines. Since PWSCF uses periodic boundary conditions, all complexes were centered in an orthorhombic unit cell with a vacuum spacing of ∼ 8Å in each Cartesian direction to ensure that the single-particle orbitals are fully enclosed. LDA orbitals were used for L7 complexes and in addition, PBE0 orbitals were also considered for C2C2PD, C3GC, and C 60 @

A. Convergence of local approximations
The LNO-CCSD(T) energy expression reformulates the CCSD(T) energy in terms of localized molecular orbitals (LMOs, i , j ) [13,71,77]: The convergence of all approximations in LNO-CCSD(T) can be assessed via the use of pre-defined threshold sets, which provide systematic improvement simultaneously for all approximations of the LNO scheme [13,14,71,[77][78][79][80]. In this series of threshold sets (Normal, Tight, very Tight), the accuracy determining cutoff parameters are tightened in an exponential manner [14]. For instance, the very Tight set collects an order of magnitude tighter truncation thresholds than those of the Normal set, which is the default choice.
The convergence behavior of the LNO-CCSD(T) interaction energies separates the studied complexes (see Fig. 2 of manuscript) into two groups. For GGG, PHE, CBH, and GCGC we observe rapid convergence toward the corresponding canonical CCSD(T) interaction energy as indicated, e.g. by the local error estimates collected in Table SII. The excellent convergence is apparent as the differences of the Tight and very Tight interaction energies are all in the 0.1-0.3 kcal mol −1 range for these four complexes. This uncertainty range is highly satisfactory for the local approximations considering that the estimated basis set incompleteness (BSI) errors for LNO-CCSD(T) are also comparable. Consequently, we perform an even more thorough analysis of the local errors for the re- Considering briefly the corresponding absolute energies collected in Sect. S1 C, one can observe that the LNO-CCSD(T) correlation energies are also sufficiently well converged. One can also consider internal convergence indicators besides the total energy. At the very Tight level, the π-π, π-σ, and also the majority of the σ-σ orbital interactions benefit from the full CCSD(T) treatment for all complexes. Additionally, none of the remaining weak electronic interactions, contributing only about 0.01% or lower portion of the correlation energy, are neglected, they are, however, approximated via second-order pair energies [13,14]. At the very Tight level, the orbital domains employed for the LNO-CCSD(T) treatment include all atoms, all atomic orbitals, and the majority of the correlating (virtual) space, spanned by, on the average, 80-95% of the orbitals of the entire complexes.

B. Single-particle basis set convergence
Regarding the convergence of the interaction energies with respect to the single particle basis set, we rely on approaches used routinely in wavefunction computations on small molecules. Dunning's correlation consistent basis sets [48] employed here are designed to systematically approach the complete basis set (CBS) limit with a polynomial convergence rate, which can be exploited to reduce the remaining basis set incompleteness (BSI) error via basis set extrapolation approaches [49,81]. We employ two-point formulae for extrapolation, labeled as CBS(X,X + 1), where X refers to the cardinal number of the aug-cc-pVXZ basis set [48] with X=T, Q, and 5.  Table SII were obtained by scaling the 0.12 kcal mol −1 with the ratio of the interaction energies of the given complex and C2C2PD.
The "Total error bar" values of Table SII also include this third, ∆ BSI related uncertainty estimate for these three complexes.
Even more details can be learned observing the convergence of the total HF and the LNO-CCSD(T) correlation energies separately for the complexes and monomers (see Sect. S1 C).
The HF total energies are converged to six significant digits at the CBS(Q,5) level, which translates into a highly convincing convergence level of 0.01 kcal mol −1 regarding the HF part of the interaction energies. In other words, that BSI error estimates collected in Table SII have negligible HF and sizable correlation contribution. Furthermore, the CP corrected interaction energies are converged up to this 0.01 kcal mol −1 level already with the smallest, aug-cc-pVTZ basis set. As expected, the CCSD(T) correlation energies tend significantly more slowly to the CBS limit with the cardinal number, but the agreement of the CBS(T,Q) and CBS(Q,5) values up to 4 significant digits is again highly satisfactory. This shows that the BSI error estimates being below 0.36 kcal mol −1 , just as the LNO error estimates, are consistent with the absolute energies and do not benefit from sizable error compensation.
Additionally, the computation of the interaction energies is warranted according to the supermolecular approach [see Eq. (1) of the main text], because total energies are converged to the necessary number of significant digits.

C. LNO-CCSD(T) energies plotted on Figs. S4. and S6
In Tables SIII-SV, we collect the absolute HF, the LNO-CCSD(T) correlation, and the corresponding interaction energies using all possible combinations of settings (Normal to very Tight, aug-cc-pVTZ to aug-cc-pV5Z, corresponding extrapolated energies, and various use of CP corrections) to document the numerical data plotted in Fig. S4 and S6. Additional analysis is provided in Sects. S1 A and S1 B.

D. Core and higher-order correlation on top of CCSD(T)
Core correlation effects are evaluated using the highly-optimized density-fitting (DF) MP2 implementation of the Mrcc package [52] using large basis sets. In that way, the magnitude of the frozen core approximation can be determined independently from the local and BSI errors. The augmented weighted core-valence basis sets [82], aug-cc-pwCVXZ with X=T and Q, were employed in combination with CP corrections. The core correlated DF-MP2 interaction energies of the C2C2PD complex, both with aug-cc-pwCVTZ and with aug-cc-pwCVQZ, as well as with CBS(T,Q), are consistently stronger by 0.22 kcal mol −1 (4.6 cal mol −1 per C atom) than those obtained using the frozen core approach and otherwise identical settings.
The missing higher-order electron correlation on top of the CCSD(T) treatment was estimated using the CCSDT(Q) scheme, which includes infinite-order three-electron and the perturbative four-electron contributions [65]. As the conventional ninth power-scaling CCSDT(Q) calculations are many-orders of magnitude more expensive than CCSD(T), we relied on the analogous LNO approximations implemented also for CCSDT(Q) [13,79] in the Mrcc package [52]. Clearly, both corrections are negligibly small compared to the deviation of CCSD(T) and FN-DMC. As the even higher-order CC terms are expected to be even smaller, it is unlikely that higher-order electron correlation effects missing from CCSD(T) could completely explain the disagreement of CCSD(T) and FN-DMC.
The weakly-correlated character of the studied system is also verified via the T1 [83] diagnostics. The T1 measures obtained for the most complicated C3GC and C 60 @[6]CPPA complexes are found to be at most 0.016 and 0.014, respectively. Considering that the T1 measure grows with the number of basis functions and that smaller than 0.02 T1 values are considered weakly-correlated already for very small systems [83], there appears to be no indication of even moderate static correlation. Moreover, neither the HF nor the CCSD iterations indicated any problems emerging usually for strongly correlated systems. The size of the singles and doubles amplitudes were also monitored in all domain CCSD computations indicating the validity of the single-reference approach, while it is convincing that the LNO approximations were found to operate excellently also for moderately statically correlated species [84]. The magnitude of the (T) correction compared to the full CCSD(T) interaction energy is also an informative measure of the static or dynamic nature of the correlation.
This ratio is consistently around 18-20% for all 8 complexes, which is well within the range observed for smaller and simpler systems, e.g., in the well-known S66 test set (cca. 13-24%) [85].

S2. DETAILS OF QUANTUM MONTE CARLO CALCULATIONS
The FN-DMC calculations mostly used 10 nodes with 28 cores each, and 14,000 walkers distributed across the cores (i.e. 50 walkers per core). We used 20 nodes for the C 60 @[6]CPPA complex and 28,000 walkers to reduce the stochastic error in a shorter time.
Here we give further details on (i) the optimization of the Jastrow factor for the reported complexes, (ii) time-step and node-structure tests for the coronene dimer and (iii) results of additional FN-DMC simulations of C 60 @ [6]CPPA. In addition, we report the total energies for C3A and C3GC complexes here in Table SVI.

A. Variational Monte Carlo Optimization of the Jastrow Factor
Variational Monte Carlo (VMC) obeys the variational principle, allowing the initial Slater-Jastrow wavefunction to be optimized iteratively towards a lower energy. Importantly, the zero-variance principle ensures that variance of the energy tends to zero as the exact energy of the system is approached. This is used in the varmin and varmin-linjas optimization algorithms in CASINO [73] to optimize the variable parameters of the Jastrow factor. The Jastrow factor is composed of explicit distance-dependent polynomial functions for inter-particle interactions, such as electron-electron (u), electron-nucleus (χ), and electron-electron-nucleus (f ), and is also system-dependent. For all complexes, we performed a term-by-term optimization using 24 parameters for u, 12-14 parameters per element for χ, and 8 parameters per element for f. The resulting VMC energy and variance for the complexes is given in Table SVII. It can be seen from Fig. S7 that the FN-DMC interaction energy of C2C2PD is converged within the stochastic error bar (corresponding to 1 standard deviation) with respect to the time-step in FN-DMC (from 0.003 to 0.03 a.u.). In addition, we computed PBE0 and PBE initial determinants (orbitals) from PWSCF, in order to assess the FN-DMC dependence of the interaction energy on the nodal-structure. Fig. S7 shows that the FN-DMC interaction energy is the same within the stochastic error bars of ∼0.5 kcal mol −1 across the three nodal-structures.  [54] has some advantages over the pre-existing standard algorithms: the locality approximation [74] (LA) and T-move scheme [75]). The DLA FN-DMC energies are less sensitive to the Jastrow factor that is used in combination with pseudopotentials at larger time-steps. This enables better overall convergence with the time-step in FN-DMC and the DLA method is also more numerically stable than LA. We tested the use of the DLA method for the GGG trimer and the coronene dimer and present the results in Table SVIII. The interaction energies of the GGG and C2C2PD complexes remain in The C 60 @[6]CPPA complex proved to be more challenging to compute with FN-DMC, due to numerical instabilities when using the locality approximation. This was alleviated by the use of the DLA method, and separately using the T-move approximation in place of the locality approximation. The T-move scheme reinstates variational form of the energy, but the energies with this approximation are more time-step dependent, as can be seen in

S3. COMPUTATIONAL REQUIREMENTS OF LNO-CCSD(T) AND FN-DMC
CPU core time and minimal memory requirements are collected in Table S2 D for repre-sentative examples: the CBH and C3GC complexes. The very Tight LNO-CCSD(T)/aug-cc-pVTZ computation for C3GC was found to be the upper limit for the CPU time requirement among all LNO-CCSD(T) computations. Compared to that it is interesting to note the case of CBH, which contains even more atoms and almost as much AOs as C3GC. However, due to the relatively low complexity of the wavefunction of CBH, its CPU time demand is found to be up to 100 times smaller than that of C3GC when using the same settings.
Unfortunately, the computations were scattered on multiple clusters and CPU types preventing the straightforward comparison of runtimes with various settings. For that reason CPU core times and the corresponding CPU types are reported. With that in mind, we find similar trends as in our previous report [14]. For instance, the memory requirement of LNO-CCSD(T) is exceptionally small compared to alternative CCSD(T) implementations, which was essential for the C3A, C3GC, and C 60 @[6]CPPA computations. Moreover, about 3-5 times more operations were performed when using one step tighter LNO settings, just as in our previous computations [14], which trend is highly useful to estimate the feasibility of analogous computations. It is also important to note that the CPU and memory requirement grow much more slowly with the basis set size than with conventional CCSD(T), where the operation count and data storage increase by about a factor of 10 with one step in the cardinal number hierarchy (e.g., from aug-cc-pVTZ to aug-cc-pVQZ).

S4. DETAILS OF DFT CALCULATIONS
The PBE0+MBD calculations were performed using FHI-aims v.190225 with all-electron numerical basis sets, with "tight" defaults and tier 2 basis functions for all elements. The total energy threshold for self-consistent convergence was set to 10 −7 eV. Spin and relativistic effects have not been included. London dispersion energies from the D4 model are computed with the dftd4 standalone program using the electronegativity equilibration charges (EEQ) and include a coupled-dipole based many-body dispersion correction (D4(EEQ)-MBD) [26].
The same geometries have been used as for the benchmark calculations for all structures.