## Introduction

Carbon and hydrogen are the fourth and the most abundant elements in the Universe1, and their mixture is the simplest basis to form organic compounds. In our Solar System, the Cassini mission revealed lakes and seas of liquid hydrocarbons on the surface of Titan2, and the New Horizon spacecraft detected methane frost on the mountains of Pluto3. In Neptune and Uranus, methane is a major constituent with a measured carbon concentration from around 2% in the atmosphere4 and a concentration up to (assumed) 8% in the interior5. The methane in the atmosphere absorbs red light and reflects blue light, giving the ice giants their blue hues6. Moreover, numerous recently discovered extrasolar planets, some orbiting carbon-rich stars, have spurred a renewed interest in the high-pressure and high-temperature behaviors of hydrocarbons7.

Diamond formation from C/H mixtures is particularly relevant; the “diamonds in the sky" hypothesis8 suggests that diamonds can form in the mantles of Uranus and Neptune. The diamond formation and the accompanying heat release may explain the long-standing puzzle that Neptune (but not Uranus) radiates much more energy than that it receives from the Sun9. Diamond is dense and will gravitate into the core of the ice giants. For white dwarfs, Tremblay et al.10 interpreted the crystallization of the carbon-rich cores to influence the cooling rate.

Many experimental studies have probed the diamond formation from C/H mixtures, but the experiments are extraordinarily challenging to perform and interpret because of the extreme thermodynamic conditions, kinetics, chemical inhomogeneities, possible surface effects from the sample containers, and the need to prove diamond formation inside a diamond anvil cell (DAC). Three DAC studies on methane disagree on the temperature range: Benedetti et al. reported diamond formation between 10 to 50 GPa and temperatures of about 2000 K to 3000 K11; between 10 to 80 GPa, Hirai et al. reported diamond formation above 3000 K12; while Lobanov et al. reported the observation of elementary carbon at about 1200 K, and a mixture of solid carbon, hydrogen, and heavier hydrocarbons at above 1500 K13. In methane hydrates, Kadobayashi et al. reported diamond formation in a DAC between 13 and 45 GPa above 1600 K but not at lower temperatures14. Laser shock-compression experiments found diamond formation in epoxy (C,H,Cl,N,O)15 and polystyrene (-C8H8-)16, but none in polyethylene (-C2H4-)17.

Moreover, there is a mismatch between the experimental results and theoretical predictions particularly regarding the pressure range of diamond formation. Density functional theory (DFT) combined with crystal structure searches at the static lattice level predicted that diamond and hydrogen are stable at pressures above about 300 GPa18,19, while hydrocarbon crystals are stable at lower pressures18,19,20,21,22. Based on DFT molecular dynamics (MD) simulations of methane, Ancilotto et al. concluded that methane dissociates into a mixture of hydrocarbons below 100 GPa and is more prone to form diamond at above 300 GPa 23, Sherman et al. classified the system into stable methane molecules (<3000 K), a polymeric state consisting of long hydrocarbon chains (4000-5000 K, 40–200 GPa), and a plasma state (>6000 K) 24. However, these simulations are constrained to small system sizes and short time scales, so that it is impossible to distinguish between the formation of long hydrocarbon chains and the early stage of diamond nucleation. Using a semiempirical carbon model, Ghiringhelli et al.25 determined that the diamond nucleation rate in pure liquid carbon is rapid at 85 GPa, 5000 K but negligibly small at 30 GPa, 3750 K, and then extrapolated the nucleation rate to mixtures employing an ideal solution model.

In this work, we go beyond the standard first-principles methods, and study the thermodynamics of diamond formation in C/H mixtures, by constructing and utilizing machine learning potentials (MLPs) trained on DFT data. To the best of our knowledge, this is the first MLP fitted for high-pressure mixtures, and the only one available for C/H mixtures with arbitrary compositions and applicable from low P-T conditions to about 8000 K and 800 GPa. We first quantitatively estimate the coexistence line between diamond and pure liquid carbon at planetary conditions. We then reveal the nature of the chemical bonds in C/H mixtures at high-pressure high-temperature conditions. Finally, we determine the thermodynamic driving force of diamond formation in C/H mixtures, taking into account both the ideal and the non-ideal effects of mixing. We thereby establish the phase boundary where diamond can possibly form from C/H mixtures at different atomic fractions and P-T conditions.

## Results

### Diamond formation in pure liquid carbon

Although planets or stars typically contain a low percentage of carbon4, it is useful to start with a hypothetical environment of pure carbon. This is to establish the melting line of diamond and to facilitate the subsequent analysis based on C/H mixtures. Moreover, the high-pressure carbon system has experimental relevance in diamond synthesis and Inertial Confinement Fusion applications26.

Figure 1 shows the chemical potential difference ΔμD ≡ μdiamond − μliquidC between the diamond and the pure liquid carbon phases calculated using our MLP at a wide range of pressures and temperatures. Our calculated melting line Tm of diamond in pure liquid carbon (solid black curve) is compared to other theoretical work and experimental shock-compression data (Fig. 1a). Our Tm is re-entrant at above 500 GPa, because liquid carbon is denser than diamond at higher pressures. This shape has been observed for the experimental melting line27. It was previously predicted using DFT simulations on smaller systems28,29,30, but not captured in the free energy calculations performed using a semi-empirical LCBOP carbon model31.

Although diamond solidification is thermodynamically favorable below the melting line, undercooled liquids can remain metastable for a long time as solidification is initiated by a kinetically activated nucleation process32. The only previous study that has quantified the diamond nucleation rate is by Ghiringhelli et al.25 using the LCBOP carbon model: the threshold J = 10−40m−3s−1 is indicated by the gray line in Fig. 1, and above this line the diamond formation rate is negligible even at the celestial scale. Overall, we find that the pure carbon system is deeply undercooled at the P-T conditions in the two icy planets (green and orange lines in Fig. 1).

### The nature of C-H bonds

Going beyond the pure carbon case, we investigate the nature of the chemical bonds in C/H mixtures at conditions relevant for planetary interiors. The high-pressure behavior of hydrocarbons is also crucial in many shock-compression experiments for the development of fusion energy platforms and Inertial Confinement Fusion capsules33. The properties of the covalent C-C and C-H bonds are well-known at ambient conditions, but it is unclear how extreme conditions affect these bonds. DFT studies coupled with harmonic approximations have predicted a variety of hydrocarbon crystals to be stable at P ≤ 300 GPa18,19,20,21,22, but these studies are restricted to low temperatures as the melting lines of hydrogen and methane are below 1000 K and 2000 K12,34, respectively, while harmonic approximations break down completely for these liquids.

We performed MD simulations using our dissociable MLP for C/H mixtures over a wide range of thermodynamic conditions. We focus on the CH4 composition to directly compare to previous studies. Other compositions can be analyzed in the same way and yield qualitatively similar behaviors. At T < 2500 K, the MD is not ergodic within the simulation time of 100 ps, and therefore analysis is performed only at temperatures above this threshold. Figure 2a shows the snapshots of carbon bonds from the MD simulations of the CH4 system. At 4000 K and P = 100 GPa, 200 GPa, and 600 GPa, the system is primarily composed of various types of hydrocarbon chains. The formation of longer chains at higher pressures is consistent with the observations in previous DFT MD studies23,24, although the DFT simulations have severe finite size effects because polymer chains consisting of just a few carbon atoms can connect with their periodic images and become infinitely long. At high pressures, the chains assemble carbon networks, and the system shows more obvious signs of spatial inhomogeneity of carbon atoms.

In our chemical bond analysis, a C-C bond is identified whenever the distance between a pair of carbon atoms is within 1.6 Å, and a C-H bond is defined using a cutoff of 1.14 Å. The cutoffs are larger than the typical bond lengths to eliminate the misidentification of broken bonds due to thermal fluctuations. The average number of C-C and C-H bonds at different conditions are shown in Fig. 2b, c. The number of bonds varies smoothly as a function of P and T.

The average number of the C-C bonds decreases with temperature. Moreover, as illustrated in the Supplementary Information, at a certain condition the carbon atoms in the system have varying number of C-C bonds, rather than all having the same number of bonds. These suggest that at T ≥ 2500 K the system is not made of hydrocarbon crystals that were predicted to be stable at low temperatures in previous DFT studies18,19,20,21,22. The average number of C-H bonds for each carbon atom is close to one at all conditions considered here even though the overall composition is CH4, indicating that most hydrogen atoms are not bonded to any carbons.

To determine the lifetimes of the C-C and the C-H bonds, we recorded the time it takes for a newly formed bond to dissociate during the MD simulations. Figure 2d, e show the average bond lifetimes. The C-H bond lifetimes are extremely short, less than about 0.01 ps. The C-C bonds are more long-lived, yet only have a mean lifetime of less than about 1 ps at all the conditions considered here. Such short lifetimes are consistent with previous DFT MD simulations of CH424. The short bond lifetimes indicate that the hydrocarbon chains in the systems decompose and form quickly. In other words, the C/H mixture behaves like a liquid with transient C-C and C-H bonds.

### Thermodynamics of C/H mixtures

We then determine the chemical potentials of carbon in C/H mixtures, ΔμC(χC), as a function of the atomic fraction of carbon, χC = NC/(NC + NH). This, combined with the chemical potential difference ΔμD between diamond and pure carbon liquid, establishes the thermodynamic phase boundary for diamond formation from C/H mixtures with varying atomic ratios.

Dilution will usually lower the chemical potential of carbon in a mixture, which can be understood from the ideal solution assumption: $${\mu }_{id}^{C}={k}_{{{{{{{{\rm{B}}}}}}}}}T\ln ({\chi }_{C})$$. However, the ideal solution model neglects the atomic interactions. To consider non-ideal mixing effects, we compute the chemical potentials of mixtures using the MLP. This is not an easy task, because traditional particle insertion methods35 fail for this dense liquid system, and thermodynamic integration from an ideal gas state to the real mixture36 is not compatible with the MLP. We employ the newly developed S0 method which accounts for both the ideal and the non-ideal contributions to the chemical potentials37:

$${\left(\frac{d{\mu }^{C}}{d\ln {\chi }_{C}}\right)}_{T,P}=\frac{{k}_{B}T}{(1-{\chi }_{C}){S}_{CC}^{0}+{\chi }_{C}{S}_{HH}^{0}-2\sqrt{{\chi }_{C}(1-{\chi }_{C})}{S}_{CH}^{0}},$$
(1)

where $${S}_{CC}^{0}$$, $${S}_{CH}^{0}$$, and $${S}_{HH}^{0}$$ are the values of the static structure factor between the said types of atoms at the limit of infinite wavelength37, which can be determined from equilibrium MD simulations of a C/H mixture with a given carbon fraction χC. μH is then fixed using the Gibbs-Duhem equation. Note that only the relative chemical potential is physically meaningful, and we conveniently select the reference states to be the pure carbon and hydrogen liquids, i.e. μC(χC = 1) = 0 and μH(χC = 0) = 0. We obtained μC and μH at different χC on a grid of P-T conditions between 10 GPa–600 GPa and 3000 K–8000 K, by numerically integrating $$d{\mu }^{C}/d\ln {\chi }_{C}$$.

$$d{\mu }^{C}/d\ln {\chi }_{C}$$ at P = 50 GPa, T = 4000 K and P = 400 GPa, T = 3000 K are shown in Fig. 3a, d, respectively. For both sets, these values deviate from the ideal behavior (i.e. constant at 1), and have maxima and minima around certain compositions. The corresponding chemical potentials are plotted in Fig. 3b, e, while the results at other conditions are shown in Fig. 4 of the Methods. As an independent validation, we also computed μC using the coexistence method described in the Methods, although this approach is in general less efficient and can become prohibitive if carbon concentration or diffusivity is low. The values from the coexistence method are shown as the hollow symbols in Fig. 3b, in agreement with the S0 method. As the statistical accuracy of the S0 method is much better compared to the coexistence approach, all the subsequent analysis is based on the former.

In both Fig. 3b and e, μC has a plateau at χC between about 0.25 and 0.35, and the same phenomenon is found at T ≤ 5000 K at 50 GPa, and at even broader temperature range under increasing pressures, up to 8000 K at 600 GPa (see Fig. M4 of the Methods). At 50 GPa, 4000 K (Fig. 3b), μC then decreases rapidly and approaches the ideal behavior at lower χC. In contrast, at 400 GPa, 3000 K (Fig. 3e), μC plateaus and reaches a constant value for χC < 0.12. The plateaus at low χC were observed at pressures between 200 GPa and 600 GPa and temperatures lower than 3500 K (see Fig. M4 of the Methods). In Fig. 3b,e, the chemical potentials of diamond, μD, are indicated by black diamond symbols and horizontal lines. If μC is larger than μD at a given χC, diamond formation is thermodynamically favorable.

To rationalize the plateaus, we express the per-atom chemical potential of the C/H mixture as

$${\mu }_{mixture}({\chi }_{C})={\chi }_{C}{\mu }^{C}({\chi }_{C})+(1-{\chi }_{C}){\mu }^{H}({\chi }_{C}),$$
(2)

and compare it to the ideal solution curve $${\mu }_{mixture,id}= {k}_{{{{{{{{\rm{B}}}}}}}}}T({\chi }_{C}\log ({\chi }_{C})+(1-{\chi }_{C})\log (1-{\chi }_{C}))$$. Figure 3c shows μmixture at 50 GPa, 4000 K. Compared with the ideal solution chemical potential (dashed gray curve) which is fully convex, μmixture has two edges. One can thus perform a common tangent construction to the μmixture curve to find out the coexisting liquid phases. The green line in Fig. 3d indicates the common tangent, and the two green crosses shows the location of the edges. For C/H mixtures with χC between the two atomic ratios ($${\chi }_{C}^{1}=0.27$$ and $${\chi }_{C}^{2}=0.36$$ at the condition shown), a liquid-liquid phase separation (PS1) will occur and form two phases with the proportions determined by the lever rule. Here the region between the two edges is not concave but linear, which is because the phase separation has little activation barrier and already occurs during the MD simulations. In other words, a C/H mixture with a carbon fraction that is between the values of $${\chi }_{C}^{1}$$ and $${\chi }_{C}^{2}$$ will first undergo spontaneous liquid-liquid phase separation, which explains the corresponding plateaus in μC of Fig. 3b,e.

Furthermore, Fig. 3f shows that, at 400 GPa, 3000 K, μmixture at low χC significantly deviates from the ideal solution approximation (dashed gray curve), and one can construct a tangent as plotted in purple. This means that, besides the aforementioned PS1, C/H mixtures at a low C fraction can also phase separate (PS2) into a fluid of mostly hydrogen and another fluid with χC ≈ 0.12 (purple cross). We show example snapshots of such phase separated configurations collected from the MD simulations in the Supplementary Information. This PS2 explains the plateau of μC at low χC in Fig. 3d, as the carbon concentrations in both phase-separated liquids stay the same, while only the proportions of the two liquids change. Supplementary Movie 1 shows the occurrence of PS2 in MD simulations. This phase separation has immense consequences: at pressures above 200 GPa and temperatures below 3000 K-3500 K, C in C/H mixtures will always have μC > μD even at very low C fraction due to PS2, and the carbon atoms will thus always be under a thermodynamic driving force to form diamond. We refer to these conditions as the “depletion zone”.

Figure 3 g presents the thermodynamic phase boundaries, below which diamond formation is possible in C/H mixtures for each indicated carbon atomic fraction. This is obtained by combining the values of μC(χC) in C/H mixtures and μD at a wide range of P-T conditions. For lower and lower χC, the boundaries deviate more and more from the Tm of diamond. At P < 100 GPa, the locations of the boundaries are very sensitive to both temperature and pressure, whereas at higher P it is mostly independent of pressure. Figure 3g can also be read in another way: for a certain P-T condition, it gives the minimal carbon ratio required to make diamond formation possible. Notice that the χC = 0.25 and χC = 0.3 lines almost overlap, which is due to the plateau of μC induced by PS1. The light blue shaded area indicates the depletion zone, where diamond formation is always possible due to PS2. In this zone, carbon atoms will first form a carbon-rich liquid phase, and diamond can nucleate from this phase. Such process is similar with a two-step nucleation mechanism previously revealed in protein systems38.

Previous experimental measurements are included in Fig. 3g, with the conditions where diamonds were either found (diamond symbols or rectangular regions) or absent (cross symbols) indicated. At lower pressures, our calculations largely agree with the observation of diamond formation for methane in DAC experiments between 2000–3000 K11 and above 3000 K12. We find less agreement with the shock-compression experiments at higher pressures16,17. We speculate that the disagreement may be because diamond formation needs to go through the activated nucleation process which may take longer than the short timescale of these rapid compression experiments, or may come from the difficulty in the temperature estimation of these experiments. The hollow diamond symbols in Fig. 3g show the diamond formation conditions from starting materials of more complex compositions: Marshall et al.15 used epoxy (C:H:Cl:N:O ≈ 27:38:1:1:5) and Kadobayashi et al.14 used methane hydrate. We find little agreement between Kadobayashi et al.14 and our phase boundaries, if we compare solely in terms of χC including all atomic species, although the agreement is improved if based on the χC of methane alone, and indeed CH4 may be an intermediate product in the experiment14. The liquid-liquid phase separations of C/H mixtures have not been previously observed, but they may be detected from speed of sound, mixed optical spectra, inhomogeneity in the diamond formation reaction, and hydrodynamic instability during compression experiments.

## Discussion

We first computed the melting line of diamond in pure liquid carbon. We then moved on to the C/H mixtures, and showed that they behave like liquids at T≥2500 K. We finally precisely computed the thermodynamic boundary of diamond formation for different atomic ratios. Notably, we revealed the occurrence of phase separations in C/H mixtures, which can greatly enhance diamond formation. For PS1, the C/H mixture will phase separate into two liquids with χC of about 0.25 and 0.35. Both liquids have the same μC, but their interfacial free energies with the diamond phase are different. Diamond will thus prefer to nucleate from the liquid with the lower interfacial free energy. At 200 < P < 600 GPa and T below 3000 K-3500 K, there is a depletion zone where C/H mixtures at a low C fraction can phase separate (PS2) into a fluid of mostly hydrogen and a more carbon-rich fluid with χC ≈ 0.12. In this zone, there is always thermodynamic driving force to form diamond from the carbon-rich phase.

Our phase boundaries in Fig. 3g put largely scattered experimental measurements11,12,15,16,17 into context, and provide a mechanistic understanding of the thermodynamics involved. They also help gauge the accuracy of the experimental determination of diamond formation conditions, extrapolate between different experiments, and guide future efforts to validate these boundaries.

Note that our boundaries are solely based on the thermodynamic criterion, but the kinetic nucleation rate may play a role particularly in shock-compression experiments. An amount of undercooling may be needed for diamond to nucleate from C/H mixtures within finite time, depending on the magnitude of the nucleation activation barrier. In homogeneous nucleation, the magnitude of the interfacial free energy contribution is crucial25,32. In experiments, DAC are in close contact with the fluid samples, so heterogeneous nucleation may happen, which requires less undercooling compared with the homogeneous case. In addition, other elements (e.g. He, N, O) are also prevalent in icy planets, and we suggest future experiments to probe how they affect the phase boundaries of diamond formation.

The “depletion zone” can help explain the difference in the luminosity between Uranus and Neptune. Being similar in size and composition, Neptune has a strong internal heat source but Uranus does not39. The “diamonds in the sky" hypothesis8,11 relates the heat source with diamond formation, but does not explain the dichotomy between the two planets. By comparing the P-T conditions at different depths of the two ice giants from Ref. 40 with our calculated phase boundaries (in Fig. 3g), one can see that a relatively small difference in the planetary profile can drastically change the possibility of diamond formation: At the P-T conditions in Uranus, diamond formation requires about 15% of carbon, which seems unlikely as less than 10% of carbon is believed to be present in its mantle4. As such, diamond formation in Uranus may be absent. In contrast, Neptune is a bit cooler so it is much more likely that its planetary profile may have an overlap with the depletion zone; at these conditions C/H mixtures will phase separate (PS2), and diamond formation is thermodynamically favorable regardless of the actual carbon fraction. If there is indeed an overlap, diamonds can in principle form in the depletion zone in the mantle of Neptune, and then sink towards the core while releasing heat. Although the mantle will become increasingly carbon-deprived, the diamond formation in the depletion zone can proceed until all carbon is exhausted. Moreover, the “diamond rain” will naturally induce a compositional gradient inside the planet, which is an important aspect in explaining the evolution of giant planets41,42.

Our carbon-ratio-dependent diamond formation phase boundaries can help estimate the prevalence and the existence criteria of extraterrestrial diamonds. Neptune-like exoplanets are extremely common according to the database of planets discovered43, and methane-rich exoplanets are modeled to have a carbon core, a methane envelope, and a hydrogen atmosphere7. Our boundaries can put a tight constraint on the structure and composition of these planets. Furthermore, diamond formation and liquid-liquid phase separation play a key role in the cooling process in white dwarfs10, and thus the precise determination for the onset of phase separation and crystallization is also crucial there.

## Methods

### DFT calculations

DFT is the workhorse of high-pressure equation-of-state calculations and has shown good agreement with several experiments on hydrocarbons and other systems44,45,46 for measured thermodynamic properties in particular for Hugoniot curves. Single-point DFT calculations with VASP47,48,49,50 were carried out for configurations with various C/H ratios to generate the training set of the MLP. The simulations were performed with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional51 employing hard pseudopotentials for hydrogen and carbon, a cutoff energy of 1000 eV, and a consistent k-point spacing of 0.2 Å−1. In addition, extensive PBE MD simulations for CH16, CH8, CH4, CH2, CH, C2H and C4H mixtures were performed, and together with previous PBE MD data for methane52, carbon30 and hydrogen53, were used to benchmark the MLP. To approximate the impact of the thermal excitation of the electronic subsystem, we set the electronic temperature equal to the average ionic temperature during the DFT MD calculations as well as in the reference calculations used to train and test the MLP. The convergence tests of DFT and the influence of the electronic temperature are provided in the Supplementary Information.

### Machine learning potential

We generated flexible and dissociable MLPs for the high-pressure C/H system, employing the Behler-Parrinello artificial neural network54, and using the N2P2 code55. The total training set contains 92,185 configurations with a sum of 8,906,582 atoms, and was constructed using a combination of strategies, including DFT MD, random structure searches, adapting previous training sets for pure C56 and H53, and active learning. The training set includes a large variety of structures: cubic/hexagonal diamond, graphite, graphane, carbon nanotubes, fullerenes, amorphous carbon, carbon structures with defects, liquid carbon, liquid hydrogen, many hydrogen crystalline polymorphs, hydrocarbon crystals, hydrocarbon liquids with varying carbon concentrations at a wide range of P-T conditions. Details on the construction and the benchmarks of the MLP are provided in the Supplementary Information. Note that the MLP has been extensively benchmarked for high-pressure liquid hydrogen, diamond/liquid carbon, and C/H mixtures based on energetic, thermodynamic and dynamic properties. However, we would like to caution the limitations of the current MLP: The MLP is not applicable to gas-phase hydrocarbons. For low-pressure carbon phases and diamond-graphite transitions, the current MLP has not been extensively tested, and we recommend users to employ the MLP from Ref. 56. Long-range Van der Waals interactions in liquid methane at low density may be important57 but are lacking in the current MLP. The comparison between PBE and MLP for structure and dynamic properties such as equation of states, radial distribution functions, diffusivity, vibrational density of states, and bond lifetimes are provided in the Supplementary Information. We recommend checking these comparisons before applying the MLP for a given C/H composition at certain conditions.

### MLP MD simulation details

All MD simulations were performed in LAMMPS58 with a MLP implementation59. The time step was 0.25 fs for C/H mixtures, and 0.4 fs for pure carbon systems.

### Computing the chemical potentials of diamond and pure liquid carbon

We computed ΔμD using interface pinning simulations60, which were performed using the PLUMED code61. We used solid-liquid systems containing 1,024 C atoms at pressures between 10-800 GPa and at temperatures close to the melting line, employing the MLP. The Nosé-Hoover barostat was used only along the z direction that is perpendicular to the interface in these coexistence simulations, while the dimensions of the supercell along the x and y directions were commensurate with the equilibrium lattice parameters of the diamond phase at the given conditions. We used the locally-averaged62Q3 order parameter63 for detecting diamond structures, and introduced an umbrella potential to counter-balance the chemical potential difference and constrain the size of diamond in the system. We then used thermodynamic integration along isotherms and isobars64,65 to extend the ΔμD to a wide range of pressures and temperatures.

### MLP MD simulation of CH4

The simulation cell contained 7,290 atoms (1,458 CH4 formula units). Each simulation was run for more than 100 ps. The simulations were performed in the NPT ensemble, using the Nosé-Hoover thermostat and isotropic barostat. At each condition, two independent MD simulations were initialized using a starting configuration of either bonded CH4 molecules on a lattice or a liquid. For T≥2500 K, the two simulations provided consistent statistical properties. These simulations were the basis for the further analysis we performed. For T < 2500 K the two runs gave different averages, meaning that under these conditions the system is not ergodic within the simulation time.

### Computing the chemical potentials of C in C/H mixtures

We used two independent methods for computing the chemical potentials of carbon in C/H mixtures at various conditions. The first is the S0 method37 that uses the static structure factors computed from equilibrium NPT simulations. The S0 method uses the thermodynamic relationship between composition fluctuations and the derivative of chemical potential with respect to concentration, and accounts for both mixing entropy and enthalpy37. The simulations were performed on a grid of P-T conditions, P = 10 GPa, 25 GPa, 50 GPa, 100 GPa, 200 GPa, 300 GPa, 400 GPa, 600 GPa, and T = 3500 K, 4000 K, 5000 K, 6000 K, 7000 K, and 8000 K. At each P-T condition, MD simulations were run for systems at varying atomic ratios, on a dense grid of χC from 0.015 to 0.98. The system size varied between 9,728 and 82,944 total number of atoms. We obtained the static structure factors at different wavevectors k using the Fourier expansion on the scaled atomic coordinates, i.e.

$${S}_{AB}({{{{{{{\bf{k}}}}}}}})=\frac{1}{\sqrt{{N}_{A}{N}_{B}}}\left\langle \mathop{\sum }\limits_{i=1}^{{N}_{A}}\exp (i{{{{{{{\bf{k}}}}}}}}\cdot {\hat{{{{{{{{\bf{r}}}}}}}}}}_{{i}_{A}}(t))\mathop{\sum }\limits_{i=1}^{{N}_{B}}\exp (-i{{{{{{{\bf{k}}}}}}}}\cdot {\hat{{{{{{{{\bf{r}}}}}}}}}}_{{i}_{B}}(t))\right\rangle$$
(3)

where AB can be CC, CH and HH, and $$\hat{{{{{{{{\bf{r}}}}}}}}}(t)={{{{{{{\bf{r}}}}}}}}(t){\left\langle l\right\rangle }_{{{{{{{{\rm{NPT}}}}}}}}}/l(t)$$ and l(t) is the instantaneous dimension of the supercell. We then determined $${S}_{CC}^{0}$$, $${S}_{CH}^{0}$$ and $${S}_{HH}^{0}$$ by extrapolating SAB(k) to the k → 0 case using the Ornstein–Zernike form as described in Ref. 37. Finally, we used numerical integration using Eqn. (3) of the main text to obtain the chemical potential of carbon for different atomic fractions, and get the chemical potential of H using the Gibbs-Duhem equation. All the chemical potential data are presented in Fig. 4.

The second approach is based on the coexistence method, similar to the setup used for computing the chemical potentials of the pure carbon systems. In this case, interface pinning simulations60,66 were performed on a diamond-C/H liquid coexistence system containing 1024 C atoms and a varying number of H atoms at pressures 0-600 GPa. A snapshot of the coexistence system is provided in the Supplementary Information. The chemical potentials estimated using coexistence are shown in Fig. 4, and the errors shown are the standard errors of the mean estimated from the values of the CV. However, there are other sources of errors that are hard to estimate: finite size effects and ergodicity issues related to the explicit interface; the carbon concentration can vary in the liquid region of the simulation box.