Understanding high pressure molecular hydrogen with a hierarchical machine-learned potential

The hydrogen phase diagram has several unusual features which are well reproduced by density functional calculations. Unfortunately, these calculations do not provide good physical insights into why those features occur. Here, we present a fast interatomic potential, which reproduces the molecular hydrogen phases: orientationally disordered Phase I; broken-symmetry Phase II and reentrant melt curve. The H2 vibrational frequency drops at high pressure because of increased coupling between neighbouring molecules, not bond weakening. Liquid H2 is denser than coexisting close-packed solid at high pressure because the favored molecular orientation switches from quadrupole-energy-minimizing to steric-repulsion-minimizing. The latter allows molecules to get closer together, without the atoms getting closer, but cannot be achieved within in a close-packed layer due to frustration. A similar effect causes negative thermal expansion. At high pressure, rotation is hindered in Phase I, such that it cannot be regarded as a molecular rotor phase.

S ince the discovery of solid molecular hydrogen in 1899, the nature of this phase has remained controversial 1 . It is now believed that the solid "Phase I" comprises rotating hydrogen molecules on a hexagonal close-packed lattice 2 . With increasing pressure the rotation becomes hindered 3 by intermolecular interactions, both steric and electrostatic, leading ultimately to phase transformations to a low temperature Phase II 4 , in which quadrupole-quadrupole interactions (EQQ) arrest the rotation 5 , and a high-pressure Phase III 6,7 , in which steric interactions dominate.
Experimental study of these phases has proved challenging. Xray study showed the hcp structure, but could not resolve molecular orientation at low temperature 8 , and the first room temperature only completed in 2019 9 . Raman spectroscopy shows peaks corresponding to quantum rotors at low pressure, which gradually broaden and shift with pressure, and a distinctive sharp phonon mode which rules out cubic close packing as a structure [10][11][12][13][14][15] . The melt line has a strongly positive Clapeyron slope at low pressures, with a turnover around 100 GPa [16][17][18][19] . The negative slope means that even though the solid is hexagonal "closepacked", the liquid must be even denser. The turnover also means the liquid has higher compressibility, but how this comes about remains unexplained. X-ray studies at low temperature traversing Phase I-II-III do not show any convincing structural changes, in part because it has proven impossible to get sufficient resolution to determine the molecular orientation 8,9 . Spectroscopy gives vibrational data, which are still insufficient to determine the structures of phases II, III, and IV. There have been many and varied attempts to identify the structures via simulations [20][21][22][23][24][25][26][27][28][29][30] . However, a consensus has not yet been reached. Based on fully ab initio calculations, including density functional theory (DFT) or quantum Monte Carlo (QMC) 31 , a number of candidate structures have been proposed for Phase II and III. Besides differing molecular orientation, they are all similar, consisting of primitive cells with lattice sites close to hcp 22 . Among the structures, the P2 1 /c-24, C2/c-24, and Pc-48 structures provide low-energy candidate structures for phases II, III, and IV.
The modern theory of the structure of these phases is based around electronic structure calculations. The early work involved calculating the ground state, assuming classical nuclei, then adding quantum-nuclear effects via the quasiharmonic approximation. This methodology, whether based on DFT or QMC, predicts hcp-like ground states for Phases I-III in agreement with X-ray data. However the spectroscopic signature of the Phase II-the appearance of many sharp, low-frequency, and peaks 11,32 -is not well reproduced by the quasiharmonic calculations. As explained in the previous paragraph, the likely cause is a failure of the harmonic mode assumption for excited states, rather than the DFT itself.
To understand the high-temperature phases, one needs to examine non-harmonic behaviour, including rotation, which means going beyond a single unit cell, e.g. using molecular dynamics. Molecular dynamics requires forces on each atom based on the positions of all the atoms in the system, which requires a force model which is fast enough to allow large simulations. Here we use a machine-learning approach to derive a transferable force model based on an interatomic potential. There are several approaches to machine-learning interatomic forces [33][34][35] , which balance speed, transferability and accuracy. We adopt an approach focusing on transferability.
Any machine-learned potential should conserve energy, and therefore be based on a Hamiltonian (the potential). Forces are guaranteed to be conservative if they depend on translational and rotational invariant quantities: the "fingerprint" of each atom. We are interested in molecular phases here, so our potential specifies which atoms are "bonded" and allows stretching but not breaking of bonds.
In this paper we apply the machine-learning approach so create an interatomic potential for molecular hydrogen. We show that the potential describes the three molecular solid phases, with free-rotor Phase I, and broken-symmetry Phase II and a high-pressure Phase III. Furthermore, the melt line has a maximum, such that at high pressure the liquid is denser than the hcp solid, a feature we attribute to short-range directional order giving lower quadrupole-quadrupole interaction in the solid. The potential is trained on energies and classical Hellmann-Feynman forces derived from standard DFT in the Born-Oppenheimer approximation adopted by all standard DFT codes, so the interatomic potential is the same for deuterium, hydrogen-deuteride (HD) and hydrogen. The potential accounts for binding due to electronic structure: contributions from quantum-nuclear effects can be incorporated using lattice dynamics or path integral methods.

Results
Fitting forces and the phase diagram. A particular challenge for hydrogen comes from the hierarchy of energies. The covalent bond is much stronger than the van der Waals attraction between molecules, which is turn is much stronger than the EQQ interactions which determine molecular orientations. To address this our potential combines a hierarchical fitting strategy alongside machine-learning (HMLP) described below.
For transferability testing, we used the standard approach of fitting to a subset of the data and testing against a different subset. Furthermore, we used an iterative fitting process: a trial potential was fitted, and applied in both Phase II annealing and melting line MD simulations. If novel configurations were found, they were used to generate more reference states for the DFT database, and the fitting process was repeated. This iterative process ensures that spurious structures are suppressed and the ground state structure is the same as found in DFT. Technical details of the forcefield parameterisation are given in the "Methods" section.
Phase diagram. Figure 1 shows the very good agreement between the classical HMLP and the DFT phase diagrams. By eliminating finite size effects, the HMLP can capture the full longrange correlations, however, this does not appear to have a significant effect on the phase boundaries.
In Phase I, the H 2 molecules exhibit free rotation at pressures below 40 GPa and temperatures below 900 K (orange hexagon symbols); at higher pressures the rotation is inhibited but there is no long-ranged orientational order. At low temperatures, Phase II becomes stable (red triangles), and the stable temperature region increases gradually with pressures. At high temperatures, the hcp lattice collapses to a liquid state. The calculated melting curve has a strong positive slope (dT/dP > 0) at low pressures, reaches a maximum at around 900 K and 90 GPa, and then drops. The HMLP predicted phase diagram agrees reasonably with experimental observations, as well as DFT (Fig. 1).
The HMLP and DFT predictions are good for the melt curve, but both overstabilise the broken-symmetry Phase II. This is due to the lack of quantum-nuclear effects, notably the zero-point energy, and can be addressed by including quantum-nuclear effects in the simulation. The discrepancy in the Phase I-II line does not indicate any inaccuracy of the HMLP itself. Our predicted melting curve is consistent with experiments: the value for the melting curve maximum is located between 80 and 100 GPa and 900 K, similar to the HMLP potential values. It also agrees with two-phase ab initio simulations, which proposed a gradual softening of the intermolecular repulsive interactions as its cause 16 . The close agreement of the HMLP transition pressure with experimental data enables us to accurately simulate behaviours of temperatureor pressure-driven phase transition between Phase I and II. A "Phase III" is observed at higher pressures, corresponding to a different symmetry-breaking. However, by design the present HMLP model should start to fail to capture the properties of H 2 at still higher pressures, where molecule dissociation needs to be considered.
Nature of Phase I. Phase I can be easily recognised in MD by ordering of the molecular centres on the hcp lattice, and disorder of the orientations. Although frequently referred to as a free-rotor phase, we find this to be true only at low pressures. As pressure is increased the angular momentum autocorrelation becomes shorter than a single rotation, and then acquires a negative component, indicating that the molecule is librating.
Another characteristic of Phase I is the molecular vibration or "vibron": in Raman scattering this corresponds to the inphase vibration of all molecules. The vibron frequency first increases, then decreases with pressure. Two plausible reasons are given for this reduction: either increased intermolecular coupling or weakening of the covalent bond. In our model, the covalent bond is always described by the same Morse parameters, so changes in the vibron frequency can arise only from resonant interactions between molecules, not weakening of the bond. Thus reproducing the reentrant vibron behaviour is a test of both the physical basis and the parameterisation of the model.
Since the molecules are rotating in Phase I, lattice dynamics cannot be used, so Raman phonon frequency is numerically characterised by the in-phase mode-projected velocity autocorrelation function (VAF) 36 . Trajectories and velocities were produced from 150 K HMLP-MD simulations within the micro-canonical ensemble (NVT) initiated in the P2 1 /c-24 structure. A very fine time step of 0.05 fs was used and the trajectory and velocities were saved every ten time steps. By calculating the bond stretching velocities and Fourier transformation of the VAFs we calculate both the total vibron density of states 26,37 from and the signal from the most strongly Raman-active mode, where ik runs over all molecules (comprising atoms i and k). A similar projection method is used for the E2g phonon 38 . Figure 2 plots the calculated total vibron spectra of solid and liquid hydrogen as a function of pressure from MD simulations. Both show a signature of vibron turnover above a critical pressure (about 54 GPa), consistent with the experimental observations 11 . This proves that bond weakening is not required for the turnover, since our potential has a fixed bond strength. Notably, the mean bond length in Phase I decreases monotonically with pressure ( Fig. 2c), again at odds with ideas of bond weakening. What appears to be happening is a competition between two effects: at higher pressures the compression of the bond causes an increase in the frequency due to anharmonicity in the potential, whereas above 54 GPa the frequency is lowered due to coupling between the molecules.
The hcp structure has a Raman-active mode (E2g symmetry) corresponding to shearing motion of the basal plane. The frequency of this mode is experimentally well-determined and extremely pressure-dependent, from 36 cm −1 at zero pressure to 1100 cm −1 at 250 GPa [39][40][41][42] . Figure 2d shows a comparison of our potential with experimental pressure dependencies ν(P) of the E2g optical. The red symbols in are our HMLP-MD predictions, consistent with the DFT data of this mode extracted from our calculations. Comparing the calculations with experiment shows that the HMLP predicted frequency curves agree better with experiment than existing isotropic empirical potentials 43,44 (olive curve).
Denser than close-packed liquid. Figure 1 shows that the potential correctly reproduces the turnover and negative Clapeyron slope. We investigated the possible explanation for this denser than close-packed liquid. Figure 3a shows the equation of state for both solid and liquid phases, with the crossover indicating where the liquid is denser than the solid. The HMLP predicts a negative thermal expansion, which is consistent with DFT 45 . The normalised radial distribution function (Fig. 3b) shows that the liquid structure is essentially unchanged with pressure up to the pressures where bond breaking becomes a factor. We therefore deduce that the denser liquid is not related to the molecular-atomic transition. Figure 4 compares the solid and liquid at the melting point. They are remarkably similar: close to the phase boundary the liquid shows five discernible neighbour peaks indicating shortranged structure to 10 Å. The molecular bond length is longer in the liquid than the solid (shown more clearly in Fig. 2), but the separation between molecules is noticeably smaller in the liquid as evidenced by the first peak in the molecule-molecule RDF. This means that the molecules get closer together in the liquid, despite being longer.
Intermolecular interactions are dominated by quadrupolequadrupole interactions and steric repulsion. Table 1 shows the implied contribution from quadrupole-quadrupole interactions calculated by electrostatics from HMLP simulations. Although the ML potential has no explicit electrostatic terms, there is a strong orientation correlation, which lowers the quadrupolar energy, not only in Phase II, but also in Phase I and to a lesser extent in the liquid. We hypothesised that molecules can get closer together if their constituent atoms are further apart (i.e. in an "X" shape viewed down the intermolecular vector). By contrast, the solid has orientations which offer higher cohesive energy. Figure 5b investigates this further using DFT, showing that the T configuration which optimises the quadrupole interaction becomes unfavourable with respect to the X configuration at a separation of 2.25 Å. As we saw in Fig. 4, the nearest neighbours are already this close by 20 GPa.  To quantify this, we looked across the various fingerprints to see which most strongly differentiate liquid from solid configurations on the melt line. This turns out to be fp_E QQ , the fingerprint with the same functional form as the quadrupole-quadrupole interaction: the MLP has learned that this is important. Figure 5a shows that the contribution from fp_E QQ becomes significantly higher in the solid above 70 GPa. We note that fp_E QQ is dimensionless, and its fitted contribution to the potential is different from the estimated quadrupole-quadrupole energy compared in Table 1.
These findings explain why the liquid is denser than the solid. At pressures above the melting point maximum, orientationdependent interactions are strong enough that rotation is inhibited [46][47][48] . However, these low-energy arrangements are typically associated with larger intermolecular distances (e.g. the T configuration). By contrast, the liquid favours other arrangements which allow the molecules to come closer.
The X configuration maximises the atom-atom distance for a given intermolecular separation. This and similar arrangements compensates for the smaller molecule-molecule distance in the liquid to give the same peak position in the atom-atom RDF in both liquid and solid.
Nature of Phase II. We performed extensive HMLP-MD simulations around the Phase I-II boundary, with different starting configurations, to determine candidate structures and phase stability of H 2 Phase II. At 150 K and 20 GPa, the orientations of the H 2 molecular axes are almost randomly distributed along different directions, indicating Phase I with freely rotating molecules. Upon compression to the high pressure of 80 GPa and cooling to a low temperature of 50 K, the material transforms to an orientationally ordered phase in which the molecular rotations are restricted (Fig. 1). By carefully comparing it with candidate structures proposed by ab initio calculations, we find that the lattice and molecular ordering is close to P2 1 /c-24, which has been one of the most thoroughly studied and strongest candidate for Phase II 5,22,23,49 .
The HMLP does not include quadrupole interactions explicitly -it has "learnt" them. Table 1 shows what the quadrupole interactions would be, using electrostatic calculation based on the HMLP configurations. The large negative values indicate the prevalence of quadrupole-type ordering: strongest in Phase II. The differences, tens of meV, is of similar magnitude to the phase   The corresponding RDF indicates that the centre of each molecule remains close to the hcp lattice sites. Consequently, we define an order parameter O relating the structures of Phase II in our HMLP-MD simulations with the static-lattice DFT predictions of P2 1 /c-24. The average value hOi exhibits a sharp change as the system transitions from the structured Phase II to the rotationally symmetric Phase I with increasing temperature. Transition temperatures were taken at discontinuous jumps in hOi. This produces the phase diagram shown in Fig. 6b. Note that the transition temperatures obtained from analysis of hOi agree with those obtained from peaks in the heat capacity calculated as ∂H ∂T À Á P from finite differences (see Supplementary Fig. 5). The phase boundary agrees well with the experiment 32 , particularly for the more classically behaved deuterium. Similarity to experiments on deuterium rather than hydrogen is perhaps unsurprising, since nuclear quantum effects such as zero-point motion are significant at the low temperatures investigated here.
Transition to Phase III. Above 160 GPa we find a high-pressure transformation to a broken-symmetry structure different from Phase II dominated by efficient packing rather than EQQ, this is at approximately the same pressure as Phase III.
Experimentally, Phase III is associated with a sharp drop in the vibron frequency and the appearance of a strong IR signal. This implies a non-centrosymmetric structure and a weakening of the molecules. In studies of HD a process of bond dissociation and recombination ("DISREC", 2HD → H 2 + D 2 ) has been observed 50 . This bond breaking is seen in DFT to also occur in pure H 2 26,30 . Our potential does not allow for bond breaking, so we have not studied the dynamics of this "Phase III" in detail.

Discussion
In summary, we have introduced a heirarchical, iterative machine-learning based interatomic potential for atomistic simulations of H 2 molecules, by directly learning from reference ab initio molecular dynamics simulations. The resultant HMLP-MD approach predicts angular energy dependence in the range of tens of meV/atom and demonstrates good transferability to various structural environments. Several applications have been presented for which our potential is particularly well suited. The fast, transferrable potential is also suitable for a wide range of further applications and extensions, including compounds, bond breaking, and path integral calculations.
The simulations reproduce the equilibrium temperature-pressure phase diagram for molecular phases (I, II, III, and melt, P < 160 GPa). The maximum in the melt curve in hydrogen is highly counterintuitive-it requires that the liquid is denser than the hexagonal close-packed solid. By detailed simulation we resolve this by showing that certain molecular orientations (e.g. X) allow the molecules to approach more closely, while others (e.g. T) have lower quadrupole energy. By monitoring an order parameter corresponding to the quadrupolar interactions find that, at high pressures, Phase I develops intermolecular correlations which lower the energy, while the liquid has correlation which lower the volume.
The simulations of the Phase I-II boundary show a transition from an ordered Phase II to a orientation-disordered Phase I. The molecules in Phase I are not freely rotation, and we find that rotation about the c-axis persists to higher-pressure/lower-temperature than rotation out of plane.
Our HMLP potential also has shown the capability of predicting the pressure dependence of the Raman-active E2g mode, consistent with experiment and previous DFT calculations. We explain the maximum frequency of the vibron as due to competition between molecular compression and stronger intermolecular coupling. Weakening of the covalent bond is not required.

Methods
Machine-learned interatomic potential Learning dataset. Structures for reference atomic environments and benchmarks were accumulated from DFT-based ab initio MD runs. The DFT calculations were performed using the CASTEP package 51 within the Perdew-Burke-Ernzerhof generalised gradient approximation (PBE) 52 for the exchange-correlation function. A cutoff energy of 1000 eV for the plane-wave basis set and a k-point mesh of 1 × 1 × 1 were selected. To ensure the transferability of the potential to a wide variety of atomistic situations, H 2 in different geometric arrangements was considered, including modest-sized bulk samples in Phase I, II, and liquid, composed of 144 H 2 molecules. Moreover, unusual configurations found in HMLP-MD with preliminary versions of the potential were added to the DFT training set to improve performance and transferability. The final accumulated dataset includes up to 41,468 configurations, which are provided as a separated file in the Supplementary Information.
The PBE functional was chosen because it has become the de facto standard in studies of molecular systems, and high-pressure hydrogen. PBE has been criticised for overstabilising the metallic phases: this is due to its behaviour at high electron density gradients 53 , and not vdW corrections. In this work we only consider molecular phases, so this is not a concern. We emphasise that determining the suitable dataset is not straightforward. Numerous iterations of the potential were required to obtain a good fit to the phase diagram. A good fit to DFT energies of known phases is not evidence that other phases are unstable: we test each iteration of the potential by running MD simulations in NPT ensemble, cycling the pressure and temperature between the regions expected for Phases I, II, and III to ensure that all phase transitions were between the fitted phases. When a crystal structure different from the fitted one was found, this new phase was calculated using DFT. If this showed that the ML potential described the phase poorly, it was added to the training set and the training redone. The success of such strategy is supported by an example shown in Supplementary Note 2.
Covalent bond. The covalent bonding contribution to the force is approximated as ðF 1 is the component of atomic force projected down the molecular axis. We examined various options for fitting the covalent bond: harmonic, Lennard-Jones and Morse potentials. Although the Morse form provides the best fit across our dataset using simple regression, we adopt the harmonic form in our potential to prevent artificial bond-breaking events at high pressure (>120 GPa). Supplementary Table 1 shows the fitted parameters for the harmonic and Morse form, respectively.
Non-bonded interactions. We describe the short-ranged Coulomb and van der Waals potentials using pairwise functions to create the fingerprint. These are built using Gaussians with a smooth cutoff in the form combined with damped sinusoidal function form where r ij is the distance between atom i and j with η k the range of the kth fingerprint, and f cut (r ij ) is a damping function for atoms within the cutoff distance. These fingerprints are mapped onto the corresponding residual energies, defined by the DFT energies less the contribution from the covalent bonding contribution.
This mapping is achieved using the kernel ridge regression (KRR) method which is capable of handling complex nonlinear relationships 54 . The details of parameterisation and mapping algorithms are shown in Supplementary Note 3.
Orientation-dependent interactions. The orientation-dependent contribution is the last considered. These are fitted to the residuals once covalent and pairwise interactions are subtracted from the DFT energies. The corresponding fingerprints for the orientation-dependent interactions are listed in Supplementary Note 3 and Supplementary Table 2.
The fidelity of our HMLP is evaluated by the comparison of our ML prediction and the DFT calculations in Supplementary Fig. 6. The mean absolute error is of the order of the expected numerical and theoretical accuracy of the reference quantum mechanics-based calculations, indicating good performance of the present ML model. Order parameter and phase boundary for the I-II transition. a Order parameter hOi as a function of temperature for the seven pressures investigated in this work. In all cases there is a sharp decrease from an ordered system (hOi ¼ 1) to a disordered system (hOi ¼ 0). The system is considered to be Phase I after hOi drops discontinuously below 0. Machine-learned molecular dynamics. The simulations were performed using periodic boundary conditions and a time step of 0.5 fs. The Nose-Hoover thermostat and the Parrinello-Rahman barostat 55 were used for controlling temperature and pressure, respectively. All simulations were carried out using the LAMMPS package and the atomic configurations were visualised with the Ato-mEye package. Typical models of the H 2 system was created with P2 1 /c-24 structure containing up to 72,576 molecules. To reproduce the entire temperature-pressure phase diagram, the NPT simulations of 1152-atom supercells of P2 1 /c-24 structure were carried out at selected temperatures and pressures, from which we can identify the corresponding stable phases and melting point via the Zmethod 56 . Furthermore, the phase-coexistence method 57 with co-existing 27,648 molecules of H 2 solid and liquid was adopted to determine the properties of solid and liquid phases at the melting curve. To probe the Phase I-II boundary, 2304 atom supercells of the Phase II P2 1 /c-24 structure were allowed to equilibrate for 250 ps at a series of pressures and temperatures.
Analysis of molecular dynamics. To distinguish between the broken-symmetry structure of Phase II and the free rotors of Phase I we introduce the orientational order parameter O.
Here the summation is over unit cells i, j, each containing a set of basis molecules b. Unit vectorsr i;b andr j;b are oriented along the the H-H bond of the bth molecule in the ith and jth unit cell, respectively, and R ij,b is the distance between the centre of mass of these two molecules. The angled brackets denote a time average. This parameter probes the long-range order in the system relative to the chosen basis, which in our case is the P2 1 /c-24 unit cell 23 . A value of 1 means that the system has the P2 1 /c-24 structure, and a value of 0 suggests that the system is disordered. Note that this order parameter only detects similarity to the given basis and thus a phase change to a structure with a different unit cell will yield an erroneously low value. The trajectories were therefore visually inspected in addition to the order parameter analysis. A 4 × 6 × 4 supercell of P2 1 /c-24 (2304 atoms) was used for Phase II, and the unit cells and basis for this system are illustrated in Supplementary Fig. 7. MD trajectories were calculated for temperatures ranging from 30 to 150 K and pressures from 20 to 160 GPa. After a 50 ps equilibration period, the order parameter O was averaged over the remaining 200 ps of the trajectory. The results are shown in Fig. 6a. The Raman-active phonons were extracted from the MD using the projection method which automatically includes anharmonic effects 36,38 .
We tried numerous approaches to measure the orientation relationship between adjacent molecules, i and j with interatomic vectors σ ! i and σ ! j separated byR ! . For Table 1 we used the explicit equation for linear quadrupoles: where Q = 0.26 DÅ is the quadrupole moment of the H 2 molecule and the orientational factor Γðσ i ! ; σ j ! ;R ! Þ is defined as:

Data availability
Datasets used in generation of the H 2 potential are available at the Edinburgh DataShare https://doi.org/10.7488/ds/2874.

Code availability
Plugin code with which the interatomic potential can be implemented through LAMMPS is available as supplementary online material in Supplementary Dataset 1.