Ab initio molecular dynamics of liquid water using embedded-fragment second-order many-body perturbation theory towards its accurate property prediction

A direct, simultaneous calculation of properties of a liquid using an ab initio electron-correlated theory has long been unthinkable. Here we present structural, dynamical, and response properties of liquid water calculated by ab initio molecular dynamics using the embedded-fragment spin-component-scaled second-order many-body perturbation method with the aug-cc-pVDZ basis set. This level of theory is chosen as it accurately and inexpensively reproduces the water dimer potential energy surface from the coupled-cluster singles, doubles, and noniterative triples with the aug-cc-pVQZ basis set, which is nearly exact. The calculated radial distribution function, self-diffusion coefficient, coordinate number, and dipole moment, as well as the infrared and Raman spectra are in excellent agreement with experimental results. The shapes and widths of the OH stretching bands in the infrared and Raman spectra and their isotropic-anisotropic Raman noncoincidence, which reflect the diverse local hydrogen-bond environment, are also reproduced computationally. The simulation also reveals intriguing dynamic features of the environment, which are difficult to probe experimentally, such as a surprisingly large fluctuation in the coordination number and the detailed mechanism by which the hydrogen donating water molecules move across the first and second shells, thereby causing this fluctuation.


Results
Potential energy surface. The fidelity of BOMD simulations for bulk systems largely hinges on the reliability of the potential energy surface (PES) obtained from the electronic structure method, and hence verifying the latter for intermolecular interactions is a crucial prerequisite for such simulations.  Supplementary Table S1. Compared with the accurate reference data of MP2/aug-cc-pVQZ or CCSD(T)/aug-cc-pVQZ, the MP2/cc-pVDZ (top panel of Fig. 1) overestimates the intermolecular attractions by as much as 50%, whereas the PES's of MP2/aug-cc-pVDZ (middle panel) and SCS-MP2/aug-cc-pVDZ (bottom panel) are reasonable. This underscores the importance of diffuse basis functions in describing intermolecular forces; namely, basis sets without diffuse basis functions such as 6-31G* are inadequate for BOMD simulations of liquid water and water clusters 19,24 . On the other hand, DFT calculations with the Becke-Lee-Yang-Parr (BLYP) functional and diffuse basis sets tend to underestimate Δ E considerably and are also inadequate (bottom panel and Supplementary Table S1). The PES's of water dimers in the C s , C i , and C 2v geometries as a function of the oxygen-oxygen separation (R OO ) were obtained at various theoretical levels (solid curves). The dimer geometries were generated by varying R OO , with all other geometrical parameters frozen at their MP2/aug-cc-pVTZ optimized values under the appropriate symmetry constraint. The accurate results from MP2/aug-cc-pVQZ (dashed curves) and CCSD(T)/aug-cc-pVQZ (dots) are superimposed. The binding energy at CCSD(T)/CBS is indicated as dashed black lines.
Scientific RepoRts | 5:14358 | DOi: 10.1038/srep14358 SCS-MP2/aug-cc-pVDZ (bottom panel) describes the attractive tail of the PES slightly better than MP2/aug-cc-pVDZ. Its repulsive parts are a little too strong, making the PES's somewhat shallower and equilibrium R OO longer than the reference data. Nonetheless, the value of Δ E predicted by SCS-MP2/ aug-cc-pVDZ (4.86 kcal/mol) is closer to the complete-basis-set (CBS) limit of CCSD(T) of 5.02 kcal/mol (dashed lines) than the value obtained at MP2/aug-cc-pVDZ (5.26 kcal/mol), while the MP2-calculated Δ E seems to converge more accurately at the correct limit with a further basis-set extension [28][29][30] .
Note that the calculated values of Δ E and . R OO min in Supplementary Table S1 do not include the zero-point vibrational energy (ZPVE) corrections or nuclear quantum corrections, which have the effect of decreasing Δ E and increasing . R OO min by 0.032 Å 31 . Since such corrections are not at present considered in our BOMD simulations, we find SCS-MP2/aug-cc-pVDZ convenient because its Δ E (4.86 kcal/mol) is smaller and its . R OO min (2.942 Å) is longer than those of CCSD(T)/CBS (5.02 kcal/mol and 2.910 Å) and thus expected to be closer to experimental values that include the ZPVE effect 32,33 . Needless to say, the use of a large basis set such as aug-cc-pVQZ in ab initio electron-correlated BOMD simulations takes a prohibitive computational cost and is out of the question.
In this article, therefore, we primarily used SCS-MP2/aug-cc-pVDZ to generate atomic forces. We additionally used MP2/cc-pVDZ as the secondary method to assess the effect of overestimated Δ E and underestimated . R OO min on the simulation results.

Radial distribution function.
No a priori information exists about the phase diagram of water for the ab initio potentials. The first step is, therefore, to identify where the liquid phase boundaries exist. One approach is to determine the melting temperature computationally 15,16 . Another is to vary the temperature such that the BOMD simulation reproduces the experimental liquid structure at ambient conditions 12 . The liquid structure is inferred from the radial distribution function (RDF), which can be observed by neutron or X-ray scattering experiment 34,35 and serves as a fingerprint of local structures and thus phases. We first performed BOMD simulations at T = 300 K, which produced a gas-phase-like RDF, and then reduced the temperature until the calculated RDF coincided with that of the observed result for the liquid phase at ambient conditions. Figure 2 shows the oxygen-oxygen RDF obtained from the SCS-MP2/aug-cc-pVDZ simulation at T = 250 K. It reproduces the experimental RDF of the liquid phase observed at T = 25 °C obtained from the X-ray diffraction technique 34 . The calculated position of the first peak is close to the experimental value (~2.8 Å) and the agreement of RDF in the region of R OO > 3.2 Å is striking. The slight difference in the calculated peak intensity from the experimental one at R OO = 2.8 Å can be diminished by taking into account the nuclear quantum effect, as noted from the previous path-integral molecular dynamics simulation 13 . The average coordination numbers of 4.7 (R OO ≤ 3.4 Å) or 4.4 (R OO ≤ 3.3 Å) are in excellent agreement with the experimental values 34 . We therefore conclude that the SCS-MP2/aug-cc-pVDZ simulation at T = 250 K and ρ = 1 g/cm 3 provides an accurate description of the structural features of liquid water at ambient conditions.
In contrast, the RDF of MP2/cc-pVDZ at T = 250 K exhibits an excessively broad first peak, which is a manifestation of an over-coordinated local structure with an average coordination number of as large as 8 (R OO < 3.7 Å). This, in turn, is due to too strong an attractive intermolecular interaction predicted by MP2/cc-pVDZ, as evidenced by Fig. 1. This first peak is followed by the second peak, which is further displaced at longer R OO , indicating that the second shell is pushed outward by the overcrowded first shell. Hence, the MP2/cc-pVDZ description of liquid water structure is nonphysical and this is largely due to the smallness of the basis set.
The isobaric-isothermal MC simulations at the MP2 level with a mixed Gaussian-planewave basis set was performed by Del Ben et al. 25 . They yielded the liquid density (ρ = 1.02 g/cm 3 ) slightly larger than the correct value. This is in qualitative agreement with our simulation, in which the average pressure is found to be − 0.6 ± 0.5 GPa at T = 250 K and ρ = 1.0 g/cm 3 ; the larger calculated liquid density at 1 bar means that a negative pressure is required to maintain it at 1 g/cm 3 . The RDF calculated at T = 295 K by Del Ben et al. was, however, overstructured, suggesting that the liquid is super-cooled, whereas the RDF obtained from our simulation is much closer to experiment. The source of the quantitative difference is hard to pinpoint because there are many approximations involved in both studies. We speculate, however, that the most likely primary source is the different choices of the basis set, in particular, the use of a pseudopotential in the preceding HF calculation in Del Ben et al.

Self-diffusion coefficient.
Whereas the RDF characterizes time-averaged local structures, the self-diffusion coefficient (D) measures dynamical properties of liquids. Its value is related to the mean square displacement (MSD) by Einstein's diffusion equation. The calculated value of D ≈ 0.27 Å 2 /ps from our BOMD simulation at T = 250 K compares favorably with the experimental value of 0.23 Å 2 /ps at T = 25 °C 36 . Hence, we argue that the present simulation properly describes the dynamical properties of liquid water at ambient conditions also.
Water molecular geometry in the liquid phase. Each water molecule undergoes a significant geometry change on going from the gas to liquid phase. The water molecule in the gas phase has the OH bond length (R OH ) of 0.966 Å and HOH angle (∠HOH) of 103.9° at the SCS-MP2/aug-cc-pVDZ level, which are in good agreement with the experimental values 37 of 0.957 Å and 104.5°. In the liquid phase, the calculated bond lengths and angles form distributions ( Fig. 3) with R OH = 0.980 ± 0.019 Å and ∠HOH = 104.7 ± 4.6°. Two experimental investigations of liquid water yielded measurements of (R OH = 0.970 ± 0.005 Å; ∠HOH = 106.1 ± 0.9°) 38 and (R OH = 0.983 ± 0.008 Å; ∠HOH = 104.1 ± 1.9°) 39 . Remarkably, the BOMD simulation predicts an average OH bond elongation (relative to the gas phase) of 0.014 Å, which is in excellent agreement with the measured elongation of 0.013 Å 38 . It is notable that the bond angle in liquid water ranges from 90° to 120° with a large standard deviation of 4.6°. Figure 4 plots the distribution of the coordination number, that is, the number of water molecules in the first shell, as well as the distribution of the hydrogen-bond number. According to our simulation, a water molecule has an average coordination number of 4.7 and an average hydrogen-bond number of 3.8. The former value is in good agreement with experiment 34 . In most cases (65%), each molecule is hydrogen-bonded to four neighbors with two serving as a H-donor (hydrogen donating water) and two as a H-acceptor (hydrogen accepting water), while the remaining neighbors in the first shell stay non-hydrogen-bonded or weakly hydrogen-bonded, as shown in Fig. 5.

Coordination numbers and intershell exchange.
The coordination number displays a surprisingly large fluctuation (4.7 ± 0.9), ranging from mono-coordination to octa-coordination. Overall, tetra-coordination is the most common (41%), followed by penta-coordination (34%) and hexa-coordination (15%). Tri-coordination (6%) and hepta-coordination (3%) are also significant, while bi-coordination (0.6%), octa-coordination (0.1%), and mono-coordination (0.01%) occur much less frequently. A majority of the penta-and higher-order  coordinations have one or more non-hydrogen-bonded or weakly hydrogen-bonded water molecules in the first shell, judging from the rapid falloff of the hydrogen-bond number after four. Figure 6 illustrates the time-evolution of the coordination numbers of three randomly selected water molecules. The large and rapid variation in each case may be related to the significant density fluctuations observed even in a nm-sized droplet of liquid water, as discussed in a recent study 40 . Figure 7 shows the (R OO , θ) and (R OO , ϕ) population contour maps of the oxygen atoms for R OO up to the second-shell RDF peak (R OO ≤ 4.5 Å) as well as the (θ, ϕ) population contour map for the first shell (R OO ≤ 3.4 Å). They reveal that H-acceptors (the A zone) are found around R OO = 2.8 Å, θ = 50°, and ϕ = 0°, while H-donors (the D zone) populate around R OO = 2.8 Å, θ = 120°, and ϕ = ± 90°. A significant population density is observed at θ = 100° around R OO = 3.4 Å in the (R OO , θ) contour. This is the likely path through which a non-hydrogen-bonded or weakly hydrogen-bonded neighbor in the D zone migrates from the first shell to the second shell and vice versa, causing the fluctuation in the coordination number.
Our BOMD trajectory data such as those shown in Fig. 6 provide even more detailed information about key events responsible for the coordination number fluctuation. Let us define a 5-4-5 hetero-exchange as the process by which a penta-coordinated water molecule loses its non-hydrogen-bonded neighbor to become tetra-coordinated, and a different water molecule takes its place (hetero-exchange), restoring the penta-coordinated state. This event is frequently observed in our trajectories, lasting an average of ca. 100 fs and occurring to a penta-coordinated species approximately every 1.0 ps. Similarly, we use the term the 4-5-4 self-exchange to describe the transient attachment of a non-hydrogen-bonded water molecule to the first shell of a tetra-coordinated water, followed by the detachment of the same molecule. This process occurs, on average, approximately every 1.5 ps with a duration of ca. 80 fs. The 5-4-5 self-exchange and 5-6-5 hetero-exchange are also noted at an average frequency of every few ps. The high frequency and long duration of these exchange events involving tetra-and penta-coordinated species are due to their greater stability and thus longer average lifetime.
Dipole moments. The dipole moment of a water molecule in the liquid phase has been under debate for a long time, partly because there is no experimental means of measuring it directly. Accurate theoretical predictions are, therefore, extremely valuable, but previously reported computed values ranged widely from 2.6 to 3.0 D 11,41,42 . An earlier DFT-MD study placed it at 2.66 D 10 , but later revised it to 2.95 D 41 . The latter, larger predicted value may be traced to the significantly overestimated polarizability characteristic of DFT. Subsequently, Badyal et al. 43 determined the partial charge on the hydrogen atom as q H = 0.5 ± 0.1 a.u. by X-ray diffraction. On this basis, they suggested the dipole moment of 2.9 ± 0.6 D, which had such a large error bar that it encompassed nearly all computationally suggested values.
The dipole moment of the water molecule in the gas phase calculated by SCS-MP2/aug-cc-pVDZ is 1.878 D, which agrees well with the experimental value of 1.855 D 44 . Figure 8 plots the distributions of the permanent dipole moments μ { } i 0 (obtained with the embedding field turned off) and the total dipole moments {μ i } (calculated with the embedding field within 9 Å and thus including the induced dipole moments) in the liquid phase. The permanent dipole moment μ i 0 at SCS-MP2/aug-cc-pVDZ is 1.872 ± 0.07 D, whereas the total dipole moment μ i is 2.67 ± 0.2 D. Hence, the induced dipole moment is significant and amounts to as much as 0.80 D on average. it could further increase slightly by the nuclear quantum effect 13 , but perhaps by not more than 0.05 D. Furthermore, the total dipole moment is broadly distributed, showing large fluctuations from 1.8 to 3.5 D, with about 5% exceeding 3.0 D, owing more to the diversity of electrostatic fields than to the monomer geometry fluctuation. The partial charge of the hydrogen atom, q H = 0.48 ± 0.03 a.u. (Fig. 8), is in good agreement with the mean of the experimental value of Badyal et al., q H = 0.5 ± 0.1 a.u. 43 . The difference in the total dipole moment between our study and Badyal et al. may be traced to the difference in the water structure: the OH bond length proposed 45  IR and Raman spectra. The SCS-MP2/aug-cc-pVDZ BOMD simulation predicts the vibrational frequencies of the water molecule in the gas phase at 1676, 3800, and 3900 cm −1 . Though BOMD simulations sample the fully anharmonic PES, owing to the classical treatment of vibrations, these frequencies are too high relative to the experimental values (1595, 3657, and 3756 cm −1 ) 37,47 , for reasons that are well understood 48 . To correct such deviations rigorously, one should use path-integral molecular dynamics 8,49,50 , which can take into account the ZPVE or quantum nuclear effects. A more empirical correction is to simply multiply the calculated frequencies by a scale factor (0.96 for the gas phase and 0.93 for the liquid phase) that can bring the calculated peak positions in better agreement with those observed experimentally.
The IR and Raman spectra of liquid water computed from the SCS-MP2/aug-cc-pVDZ BOMD simulation at T = 250 K are compared with the corresponding experimental results at ambient conditions [51][52][53] in Fig. 9. Unlike most calculated spectra, each band in our simulated spectra is not convoluted with an assumed band shape, but is instead associated with a physically meaningful width that reflects the diversity of local environments in which the vibrations occur. The BOMD simulation reproduces all three major features of the observed IR spectrum (top panel): the broad, intense OH stretching band above 3000 cm −1 , the narrower peak due to the bending mode at around 1600 cm −1 , and the manifold of intermolecular vibrations below 1000 cm −1 .
The OH stretching band in the observed IR spectrum peaks at about 3400 cm −1 with a weak shoulder around 3250 cm −1 and a full width at half maximum (FWHM) of 375 cm −1 52,53 . Remarkably, the BOMD simulation predicts the OH stretching band with the correct band width (FWHM of about 375 cm −1 ) and slightly asymmetric band shape, which also agrees with the observed shape, though the calculated peak positions of 3550 and 3700 cm −1 are blue-shifted. It is striking that the classical "continuum" picture 48 , in which diverse instantaneous hydrogen-bond environments surrounding the OH stretching vibrations give rise to the band width, is borne out by our simulation. Our time-domain computation of the spectra should also naturally account for the motional narrowing of this band, which is estimated to be as much as 30% of the width 54 .
The OH stretching band of the observed VV Raman spectrum has bimodal peaks at about 3250 and 3400 cm −1 and a FWHM of about 425 cm −1 , while the observed VH spectrum shows the OH stretching band around 3460 cm −1 with an asymmetric line shape and a FWHM of about 300 cm −1 51 . The calculated VV and VH Raman spectra are also in agreement with the observed spectra. It explains the asymmetric line shape of the OH stretching band in the VH spectrum and the double-peaked nature of the band in the VV spectrum, including the difference in peak positions between the VH and VV components, known as the Raman noncoincidence effect 55 . With the aforementioned frequency scaling by a factor of 0.93, the simulated VV and VH spectra are again in excellent agreement with their experimental counterparts both in peak positions and shapes.
It could be emphasized that our spectral simulation does not involve any adjustable parameter (which is not to say that there is no approximation) in the anharmonic force field, mode-mode coupling, polarizability tensor, etc., unlike previous analyses on the same bands 48 . Note that polarizable model potentials of water 8,56 currently fall short of simultaneously accounting for polarization effects on intermolecular interactions and optical response, such as the Raman effect. Our ab initio method can achieve just that, although the peak positions are systematically overestimated because of the classical treatment of vibrations.

Discussion
We have performed ab initio BOMD simulations of liquid water at the SCS-MP2 and MP2 levels of theory and computationally reproduced the structural properties (the radial distribution function, monomer geometry and its fluctuation) and structure-driven collective properties (the self-diffusion coefficient, coordination number and its variation, intershell exchange) as well as the electronic and response properties (permanent and induced dipole moments, IR spectra, and isotropic and anisotropic Raman spectra) simultaneously, accurately, and from first principles. The level of theory has been chosen to reproduce the CCSD(T)/aug-cc-pVQZ results of the water dimer, allowing us to study the electronic details of liquid water with unprecedented accuracy.
The calculated properties of liquid water are in good agreement with the experimental results, when the latter are available. The simulated IR and Raman spectra correctly predict the band shapes and widths of the OH stretching bands and the difference between their isotropic and anisotropic Raman components, which are exceedingly difficult with classical MD with parameterized force fields, even if they are polarizable.
For properties that are difficult to probe experimentally, our simulations provide unique insights. The average bond length and angle of the water molecule in the liquid phase are found to increase only minimally, but range from 0.92 to 1.05 Å and from 90° to 120°. The coordination number fluctuates even more greatly from mono-coordination to octa-coordination, though the tetra-(41%) and peta-coordinations (34%) are most common. Such large fluctuation in the coordination number might lead to significant density fluctuations. We have identified 5-4-5 hetero-exchange and 4-5-4 self/hetero-exchange as the key events, occurring approximately every 1-2 ps with an average duration of ca. 0.1 ps and causing the large fluctuation in the coordination number. Such exchanges take place as a non-hydrogen-bonded water molecule in the D zone migrates between the first and second shells. The total dipole moment (μ i = 2.67 ± 0.2 D) of the water molecule in the liquid phase also shows a large fluctuation, ranging from 1.8 to 3.5 D, reflecting more the diversity of the electrostatic field surrounding the water molecule than the geometry fluctuation.
Hence, the present method allows us to assess the validity of uncertain results or conclusions about a whole range of liquid water properties. It can also be exploited to study other unknown collective properties of liquid water and various solvents as well as colligative properties of solute-solvents generally.

Methods
Born-Oppenheimer molecular dynamics. BOMD simulations were carried out in the canonical (NVT) ensemble with the velocity Verlet algorithm in conjunction with the Nosé-Hoover chain method 57,58 . A time step of 1 fs was used. An initial liquid structure of (H 2 O) 32 in a cubic cell of side 9.858 Å was obtained from an equilibration run with the TTM3-F force field 8 at T = 300 K. Using the same force field, we verified that the 32-water simulation yielded the calculated RDF which was unchanged from the result of a 512-water simulation and was, therefore, adequate. Using SCS-MP2/aug-cc-pVDZ or MP2/cc-pVDZ energies and atomic forces at T = 250 K, a 6-ps equilibration run was performed, after which the trajectory was sampled every 4 time steps during a 5-ps production run. IR and Raman spectra. The IR and Raman spectra were calculated by Fourier transformation of the time-correlation functions of the system's dipole moment and polarizability, respectively. For a direct comparison with the experimental spectra, the harmonic quantum correction was applied to the classically computed spectra 59 . IR intensity I IR (ω) at frequency ω then becomes 56 Here, the dipole moment of the ith monomer embedded in the electrostatic field, μ i , is directly computed by the SCS-MP2 method.
The harmonic-quantum-corrected, Bose-Einstein-reduced isotropic and anisotropic Raman spectra were calculated by 56