Introduction

The Earth’s core is composed mostly of Fe, alloyed with ~5 wt% Ni and a certain amount of light elements including Si, S, O, C, and H1,2,3. The presence of light elements is required to explain the reduced densities of both the liquid outer core (by ~6–8%) and the solid inner core (by ~4–5%) compared to those of pure Fe2,3. In addition, the compressional wave velocity (VP) of the outer core is also elevated by ~4–5% than liquid Fe due to light elements2. Using these density and velocity differences, the compositions of the outer and inner cores have been constrained separately by various studies4,5,6,7. In both cases, however, the exact proportions of light elements remain hotly debated, because the inversion results from limited number of geophysical constraints are highly non-unique. Additional constraints can be introduced by considering that the outer core is in chemical equilibrium with the inner core from which it crystallized2. Thus, the equilibrium partition coefficients of light elements between solid and liquid Fe-alloys (\({D}_{X}^{{sol}/{liq}}={C}_{X}^{{sol}}/{C}_{X}^{{liq}}\), where X is a light element, \({C}_{X}\) the concentration of X in the solid or liquid alloy) at the inner-core boundary (ICB) conditions2,8,9,10 are key parameters that control the light element ratios between the inner and outer cores and should be included in the inversion of core composition.

The density jump (~4–5%) across the ICB is an important observation that reflects how light elements are partitioned. It is too large to be explained by the crystallization of Fe alone, indicating a higher concentration of light elements in the outer core than that in the inner core2,6. Several first-principles molecular dynamics (FPMD) studies have examined the chemical equilibrium of Si, S, O, and C at the ICB conditions8,9,11,12, using the thermodynamic integration technique. It has been demonstrated that neither Si nor S can be the sole light element in the core, with \({D}^{{sol}/{liq}}\) of ~1 and ~0.75 for Si and S, respectively, to produce a large enough density jump at the ICB8,12,13. Whereas O and C, on the contrary, strongly prefer to stay in the liquid outer core during inner core growth8,11,13 with \({D}^{{sol}/{liq}}\) of ~0.024 and <10−4, respectively, and thus would lead to nearly zero light elements in the inner core, inconsistent with the density deficit.

Hydrogen is the most abundant element in the Solar nebula. Measurements on water content and H isotopes of enstatite chondrite meteorites suggest that the proto-Earth could have gained a large amount of H from these inner Solar System building blocks during its accretion14 and formed a hydrous magma ocean. Recent experiments15,16 and FPMD simulations17,18 demonstrate that H becomes increasingly siderophile with pressure and thus the early accreted H would dissolve strongly into molten Fe during the core segregation process from the magma ocean. These results suggest that H is another candidate for light element that could potentially satisfy the density jump constraint, but it remains challenging to determine the partitioning behavior of H between solid and liquid Fe-alloys under the ICB conditions due to difficulties in both experiments and simulations. For example, iron hydride is formed only at high pressure and decomposes rapidly under ambient conditions, making it difficult to measure H content in laboratory experiments19,20. Hydrogen atoms occupy the interstitial sites of the hcp-Fe lattice because of their small size and become superionic with very fast diffusion (~10−7 m2/s)21. As a result, the strong non-harmonic effects of superionic H make it difficult to define a suitable reference system for calculating the free energy difference and chemical potential of the system when using the thermodynamic integration method8,9,11,12,13. By performing two-phase simulations with coexisting liquid and solid Fe-H alloys, ref. 22. directly measured H partition coefficients at ICB conditions, but their results were limited to a relatively narrow H content range and lack a physical model for the composition dependence.

Here we calculated the chemical potential of H directly in Fe-H alloys based on the test particle method, which is widely used in liquids and microporous solids without the need of a reference system23,24. In conjunction with a machine learning technique to vastly reduce the computational costs of FPMD, we obtained the solid/liquid partition coefficient of H (\({D}_{H}^{{sol}/{liq}}\)) and examined its influence on the density jump across the ICB. With this result, we present new constraints on the compositions of the outer and inner cores by considering chemical equilibria between them.

Results

Partition coefficient of hydrogen at the ICB

Chemical potentials of H (\({\mu }_{H}\)) in liquid and solid Fe-H alloys at 330 GPa and 6200 K were determined using the Widom particle insertion method24,25. The chemical potential is divided into the ideal gas component and an excess part (Methods). The excess chemical potential is related to the ensemble average of the spatial integration of \(\exp \left(-\Delta U/{k}_{B}T\right)\), where \(\Delta U\) is the internal energy difference of the system before and after inserting a H test atom at a grid point in the super cell of 64–79 particles. We performed density functional theory calculations (DFT) to determine \(\Delta U\) for each insertion attempt and used molecular dynamics (MD) simulations to take the ensemble average. A typical calculation for chemical potential requires more than 12,000 grid points for the spatial integration and more than 1200 integrations for the ensemble average. These require an enormous number of DFT calculations and thus pose a major challenge for chemical potential calculations. In this study, we utilized a machine learning technique to establish a relationship between the atomic structure (the insertion particle and its surrounding atoms) as described by Smooth Overlap of Atomic Position26 (Methods) and \(\Delta U\) for a subset of configurations. The well-trained neural networks were then used to predict \(\Delta U\) of other configurations during insertion. This approach dramatically reduced the amount of calculations, making particle insertion method possible for Fe-H systems.

The calculated chemical potentials of H in liquid and solid Fe-alloys as a function of pressure are shown in Fig. 1a. We find that pressure increases the chemical potential \({\mu }_{H}\) slightly over the pressure range of 313–323 GPa, and hence linearly extrapolated the calculated \({\mu }_{H}\) to 330 GPa. On the other hand, H concentration strongly influences \({\mu }_{H}\) at 330 GPa and 6200 K (Fig. 1b) and the compositional dependences can be described by the Bragg–Williams model27,28 (Methods). The lower chemical potential of H in the liquid than that in the solid at the same concentration indicates that H prefers to enter the outer core at the ICB. This is also corroborated by the relatively smaller ΔU of inserting an H atom in Fe liquid than in Fe solid as shown in Supplementary Fig. S9. The solid/liquid partition coefficient \({D}_{H}^{{sol}/{liq}}\) as a function of H content was then calculated by taking the H concentration ratios at equal chemical potentials. As shown in Fig. 1c, \({D}_{H}^{{sol}/{liq}}\) increases slightly with H content in the liquid from 0.29 for pure Fe liquid to 0.46 for Fe-1.2 wt% H liquid (Fig. 1c). Our partition coefficient is lower than the measured value from an earlier experimental study at ~1573 K and 7.5 GPa20 and that estimated from the Fe-H binary phase diagram at 15 GPa29. Such reduction in \({D}_{H}^{{sol}/{liq}}\) is likely due to the effects of extreme pressure and/or temperature at the ICB. The change of crystal structure of Fe from fcc to hcp may also contribute to the reduction. The two-phase simulations by ref. 22 at 330 GPa also showed that H favors liquid Fe, but their partition coefficients are somewhat higher than our results. The reason for this discrepancy is not known. The fact that the melting temperature of Fe obtained by ref. 22 was higher than most computational and experimental results may suggest some overestimate for the free energy of liquid relative to that of solid using the machine learning potentials.

Fig. 1: Chemical potential and partition coefficient of H under Earth core conditions.
figure 1

a Chemical potential of H \(({\mu }_{H})\) in Fe-H alloys at 6200 K and pressures close to 330 GPa. Solid symbols are calculated results from FPMD simulations; Open symbols represent linearly extrapolated results to 330 GPa over a pressure range of less than 20 GPa. FexHy(z)+H1 represents \({\mu }_{H}\) calculated by inserting an H atom into a supercell with x number of Fe atoms and y number of H atoms in the z state (s for solid and l for liquid). The vertical shaded area indicates the ICB pressure of 330 GPa. b \({\mu }_{H}\) as a function of H concentration in Fe alloys at 6200 K and 330 GPa. The blue squares and red circles are \({\mu }_{H}\) in Fe-H solid and liquid, respectively. Blue and red curves are fitting results using the Bragg–Williams model. c Partition coefficient of H between solid and liquid Fe as a function of H concentration in the Fe-H liquid. The green, blue, and red lines are the partition coefficients from previous partition experiments (O97) at 7.5 GPa20, Fe-H binary phase diagram (S14) at 15 GPa29, and our calculations and at 330 GPa, respectively. Purple circles are the estimated partition coefficients from two-phase simulation by Y202222.

Density jump at the ICB caused by H partitioning

The seismically observed density jump at the ICB provides a strong constraint to light element concentrations in the core if the partition coefficients \(({D}^{{sol}/{liq}})\) are known30. Here we calculate the density jump at the ICB using our calculated \({D}_{H}^{{sol}/{liq}}\) and find that H alone cannot account for the density jump. Assuming H is the only light element in the core, we first estimated the maximum H content in the outer core by comparing the density of Fe-H liquids calculated from a linear mixing model of volume (Methods) with that of the PREM model at the ICB (12.17 g/cm3)31. Using our FPMD simulation results (Supplementary Table S1), we estimated the partial atomic volumes of H and Fe in the outer core at the ICB conditions to be 1.39 and 6.97 Å3, respectively. These values suggest 0.92 ± 0.05 wt% H is required to match the density of the outer core31. This result is consistent with the estimate of 0.89 wt% H in the outer core by ref. 7. using both density and compressional wave velocity and assuming ICB temperature is 6000 K. For this much H in liquid Fe, \({D}_{H}^{{sol}/{liq}}\) is 0.38 ± 0.01 and therefore 0.35 ± 0.03 wt% H is in the inner core. The resulted density jump at the ICB is 7.26 ± 0.31% (Fig. 2), with 1.48 and 6.80 Å3 for partial atomic volumes of H and Fe in the inner core (hcp structure) at the ICB conditions, respectively. This density jump is higher than the observed value (4.5 ± 0.5%)31, suggesting H cannot be the only light element in the core32. Other light element candidates with higher partition coefficients, i.e., Si or S, must be present in the core to reduce the density jump caused by H at the ICB.

Fig. 2: Calculated density jump between the outer core and the inner core at the ICB as a function of \({D}_{H}^{sol/liq}\).
figure 2

The calculated density jump is shown as the black line. It is assumed that H is the only light element in the outer core and 0.92 wt% H is present in the outer core to match the outer core density. The gray area bracketing the black line indicates the uncertainty in the calculated density jump due to 0.5% uncertainty for the PREM density. The green shaded area is the observed density jump from the PREM model. The blue shaded area is the density jump based on our calculated \({D}_{H}^{{sol}/{liq}}\) at the ICB, as indicated by the red shaded area.

The composition of the Earth’s core

Here we consider multiple light elements in the core and combine seismic models of core density and velocity, mineral physics data, and solid/liquid partition coefficients of light elements to simultaneously find inner core and outer core compositions that best explain seismically observed outer core density and velocity and the density jump at the ICB. We assume there is no O and C in the inner core based on their very low partition coefficients8,11, so the inner core is composed of Fe, Ni, Si, S, and H, and the outer core is composed of Fe, Ni, Si, S, O, C, and H. The density and VP of the outer core were calculated using the ideal mixing model with parameters obtained from FPMD calculations in ref. 7. (Methods), while the density of the inner core at the ICB was obtained from our Fe-S, Fe-Si, and Fe-H simulations (Supplementary Tables S1 and S2). Effects of ~5 wt% Ni on the density and VP of the outer core and the density of the inner core have been shown to be very small and were thus included as ad-hoc corrections to the mixing results (Methods). Assuming partition coefficients of light elements at the ICB are mutually independent, we used a simulated annealing algorithm33 to search for the best-fit compositions of the outer core and inner core that satisfy the partition coefficients of Si8, S8,12, and H (this study) and minimize a reduced misfit function Δ (Methods) that quantifies the difference between our model predictions and PREM values.

The minimization was first performed for core compositions of Fe-5wt%Ni alloyed with two light elements from Si, S, O, C, and H. Among the ten possible light element combinations, we find that only the (Fe, Ni)-Si-H system produces solutions that have a misfit smaller than 1, i.e., within 1σ from PREM values, and only (Fe, Ni)-Si-H and (Fe, Ni)-S-H systems produce solutions that have misfit values smaller than 1.5 (Fig. 3). Four other light element combinations including Si-O (Fig. 3), Si-S, S-O, and S-C (Supplementary Fig. S10) can produce solutions within 2σ range from PREM. However, among these possibilities, the Si-S, S-O, S-C solutions all require S contents in the range of 7.9-12.1 wt%, considerably higher than the core S content allowed by cosmochemical constraints (1.7 wt%, ref. 34). Our best-fit composition (Δ < 1 with a minimum value of 0.82) for the outer core is Fe with 5% Ni, 4.1 ± 1.4% Si, and 0.47 ± 0.16% H by weight, and the composition for the equilibrating inner core is Fe with 5% Ni, 4.1 ± 1.4% Si, and 0.15 ± 0.05% H by weight. Adding any amount of O, S, and/or C to the best-fit compositions increases the misfit. For example, a global search with all five light elements included found an outer core of Fe-5%Ni-4.1%Si-0.04%C-0.1%O-0.01%S-0.44%H and an inner core of Fe-5%Ni-4.1%Si-0.01%S-0.14%H, with a misfit (\(\Delta\)) of 1.06. Thus, we conclude that H and Si are the preferred light elements in the core that can best explain the density and velocity observations and are consistent with the chemical equilibrium at the ICB. Note that this does not exclude the presence of O, S, and C in the core, but they are likely not the major constituents of the core as constrained by the PREM.

Fig. 3: The light element compositions of the outer core.
figure 3

ac are the fitting results in the (Fe, Ni)-Si-H, (Fe, Ni)-S-H, and (Fe, Ni)-Si-O systems, respectively, as constrained by the light element partitioning at the ICB. The ellipses represent the misfits of calculated core properties from PREM results. The best light element compositions are represented by the red dot in each case, with the best result for the misfit annotated.

Our best-fit core compositions are consistent with previous cosmochemical and geochemical constraints. Both core-mantle differentiation models35,36 and the fractionation of Si isotopes37,38,39 suggest Si is a major light element in the core. If the bulk Earth was formed with the same Si/Mg ratio as carbonaceous chondrites, and all missing Si of Earth’s mantle entered the core during core-mantle differentiation process, then the maximum Si content in the core would be 7 wt% Si. The lower Si content (4.1 ± 1.4 wt%) we obtained may reflect the slight volatility of Si during Solar nebula condensation. Combining geochemical constraints from core formation models and geophysical constraints of the outer core, Badro et al. (ref. 6) found a core composition with 2–3.6% Si and 2.7–5% O, but did not include the effect of H. Such a composition requires a more oxidized initial condition than the present Earth. Other accretion and core formation models suggest that Earth may have formed under reducing conditions initially with oxygen fugacity increased in the later stage of accretion36,40,41. Our core composition results are more consistent with the low fugacity accretion models35,39,40, and can improve the overall fit to seismological models at the same time. The inner core and outer core H contents we obtained are comparable to the H contents in ~1 and ~50 oceans, respectively. The large quantity of H was likely accreted before or during the core formation stage of Earth, when substantial amount of H in the hydrous magma ocean could have partitioned to molten Fe due to its strong siderophile behavior at high pressures. Up to 0.3–0.6 wt% H16 or 1.0 wt% H17 could have entered the core, depending on specific metal/silicate partition coefficient of H. The presence of H in the inner core also helps explain the high Poisson’s ratio of the inner core5.

Our core compositions are derived from present-day seismic models and thus represent current core properties. It is critical to demonstrate that such a large amount of H in the core would be retained in the core over geological time without much loss to the mantle through the core-mantle boundary. Here we utilized the H partition coefficient between liquid iron and silicate melts (\({D}_{H}^{{Fe}/{MgSi}{O}_{3}{melts}}=\) 9.1 at 135 GPa and 4200 K18) and H partition coefficient between bridgmanite (or post-perovskite) and silicate melts (\({D}_{H}^{{brg}/{MgSi}{O}_{3}{melts}}\) ranges from 0.024 to 0.08742 or even lower43,44 and \({D}_{{{{{{\rm{H}}}}}}}^{{ppv}/{brg}}=0.2\) at 121 GPa and 2400 K45) to estimate the equilibrium H content in the mantle. Given a H content of ~0.45 wt% in the outer core, the H content in bridgmanite was estimated to be ~12–43 ppm, and in post-perovskite, ~2–8 ppm, assuming chemical equilibrium between the core and mantle. The extent to which the lower mantle is in equilibrium with the core is controlled by the diffusion rate of H. Given the diffusion coefficient of H in H-bearing bridgmanite and post-perovskite (\(D=\)~5 × 10−8 m2/s at 140 GPa and 4000 K46), the penetration depth47 (\(L \, \approx \, 2\sqrt{{Dt}}\)) of H into the mantle over the age of Earth is approximately 168 km. This suggests that the maximum amount of H transferred from the core to the mantle is equivalent to H content in ~0.01–0.04 oceans for bridgmanite, and ~0.002–0.008 oceans for post-perovskite. This quantity is considerably smaller than the total H content in the core (equivalent to 51 oceans) from our calculations. Therefore, the core-mantle interaction is unlikely to considerably alter the composition of the Earth’s core over geological time.

We compare the density and \({V}_{P}\) of the core calculated from our best-fit (Fe,Ni)-Si-H model with those of the PREM in Fig. 4. Our results reproduce the density profile of PREM very well, whereas the calculated \({V}_{P}\) profile of the outer core agrees at most depths but deviates from that of PREM by more than 1\(\sigma\) uncertainty at the top 400 km of the outer core. This deviation is consistent with the low velocity anomaly in the Earth’s outermost core (300–700 km below CMB, called E’ layer) proposed in several studies48,49. Possible explanations of the E′ layer include a composition gradient due to core mantle reaction50, stratification following giant impact51, immiscibility of Fe-Si-O liquids52. Seismic velocities of the inner core could provide additional constraints to our composition model, but awaiting future studies on the velocity of Fe-S-Si-H alloys under inner core conditions. It should be noted that our composition model assumes that the partition coefficients of various light elements are independent from each other for the composition range relevant to the core, however, pair interactions between light elements in Fe-alloys have been observed for S-Si53,54, S-C55, C-H56, etc. Specifically, recent solid/liquid partition experiments54 on the Fe-Si-S system up to 189 GPa have demonstrated that DS decreases considerably with increasing Si content, whereas DSi increases with increasing S content. We conducted an additional global search of core compositions by using DSi = 1.1 and Ds = 0.2 extrapolated from ref. 54. to explore the effect of Si-S interactions on the optimal compositions. In this case, the most favorable composition of the outer core includes 5 wt% Ni (fixed), 3.96 wt% Si, 0.04 wt% C, 0.09 wt% O, 0.02 wt% S, and 0.44 wt% H, suggestion a slight decrease of the Si content in the outer core but negligible changes in contents for other light elements. However, this does not imply that the interactions of light elements are insignificant; rather, it highlights the absence of data on Si-H interactions. These interactions need to be determined at ICB conditions and be included in future core composition models when available.

Fig. 4: Calculated density and bulk sound velocity of the Earth’s core using the best-fit composition.
figure 4

The calculated density (a) and bulk sound velocity (b) were obtained along the geotherm (TICB = 6200 K) for the best-fit composition of the Earth’s core ((Fe, Ni)−4.1 wt%Si-0.47 wt%H for the outer core, and (Fe, Ni)−4.1 wt%Si-0.15 wt%H for the inner core). The blue curves are the PREM density and velocity, with their uncertainties shown as the blue shaded areas.

Methods

Calculation of chemical potential and partition coefficient

The test particle method proposed by Widom25 is a direct method to calculate the chemical potential based on its definition and requires no reference system24 as in the case of the thermodynamic integration method9. Assuming the system contains NFe Fe and NH H atoms, the approximate expression for the chemical potential of hydrogen can be written for the canonical (NVT) ensemble24:

$${\mu }_{{{{{{\rm{H}}}}}}}(V,T,{N}_{H}) \, \approx \, F\left(V,T,{N}_{{Fe}},{N}_{H}+1\right)-F\left(V,T,{N}_{{Fe}},{N}_{H}\right)$$
(1)

where V is volume, T is temperature, F is Helmholtz free energy. Using the classical partition function of the canonical ensemble, the chemical potential \({\mu }_{H}\) can be divided into an ideal gas term μid and an excess term μex, as

$${\mu }_{H} = \ {\mu }_{{id}}^{H}+{\mu }_{{ex}}^{H} = -{k}_{B}T{ln}\left(\frac{V}{\left({N}_{H}+1\right){\Lambda }_{H}^{3}}\right) \\ -{k}_{B}T{ln}\left(\frac{1}{V}{\left\langle \int exp \left(-\frac{\Delta U}{{k}_{B}T}\right)d{{{{{{\boldsymbol{r}}}}}}}_{N+1}\right\rangle }_{N}\right)$$
(2)

where kB is Boltzman constant, \({\Lambda }_{H}=h/\sqrt{2\pi {m}_{H}{k}_{B}T}\) is de Broglie wavelength for H atom, r is atomic position vector, U is potential energy of the system, and ΔU is the potential energy change due to insertion of an H atom at position \({{{{{{\boldsymbol{r}}}}}}}_{N+1}\). Such an energy change (ΔU) includes the overall interactions change of Fe-H, H-H, and Fe-Fe atoms in the system due to insertion. The angle brackets \({\left\langle \ldots \right\rangle }_{N}\) in Eq. (2) denote the ensemble average of a spatial integration over the N-particle system,

$${\left\langle \int exp\left(-\frac{\varDelta U}{{k}_{B}T}\right)d{{{{{{\boldsymbol{r}}}}}}}_{N+1}\right\rangle }_{N}=\frac{\int \exp [-U({{{{{{\boldsymbol{r}}}}}}}^{N+1})/{k}_{B}T]d{{{{{{\boldsymbol{r}}}}}}}^{N+1}}{\int \exp [-U({{{{{{\boldsymbol{r}}}}}}}^{N})/{k}_{B}T]d{{{{{{\boldsymbol{r}}}}}}}^{N}}.$$
(3)

In our simulations, such an ensemble average was taken by picking ~1200–2400 evenly spaced snapshots (once every ten snapshots) of atomic configurations from MD trajectories (Supplementary Fig. S8). For each snapshot, the spatial integration \(\int \exp \left(-\Delta U/{k}_{B}T\right)d{{{{{{\boldsymbol{r}}}}}}}_{N+1}\) was evaluated by inserting an H atom at different hypothetical positions. Theoretically such an integration should be taken over the entire space, but in practice, the integration can be sufficiently approximated by inserting the H atom at a uniform 3-D grid of the super cell of the system. In fact, most insertion attempts lead to a vanishingly small value of \(\exp \left(-\Delta U/{k}_{B}T\right)\) due to the close distance to an existing atom and hence the high energy change penalty. We performed test calculations with various grid sizes and threshold distances (\({r}_{{Fe}-H}\) or \({r}_{H-H}\)) between the inserted particle (H) with solvent particles (Fe or H) to check conditions for convergence (see Supplementary Notes S1 for detailed results) and minimize computational costs when possible. We found that a grid size of 0.28 Å (distance between adjacent grid points) was small enough to achieve demanded accuracy (Supplementary Figs. S2 and 3). For example, a grid with 23 × 23 × 23 (12,167) points for the Fe32H32 super cell is sufficient for the 0.28 Å grid size requirement. Moreover, we found on average only about 40% of the grid points would contribute to the spatial integration. These grid points were selected based on two threshold distances: \({r}_{{Fe}-H}\) = 1.0 Å and \({r}_{H-H}\) = 0.6 Å (Supplementary Figs. S24). Insertions with smaller distances than the thresholds would result in negligible contribution. These thresholds are consistent with the distances at which the radial pair distribution functions of \({g}_{{Fe}-H}\) and \({g}_{H-H}\) start to increase dramatically (Supplementary Fig. S5). The calculation method can be applied to both solid and liquid Fe-alloys because H is present predominantly as interstitial atoms in the Fe-H alloy.

The chemical potential at a given pressure can be obtained by subtracting a correction term57 due to the pressure change during particle insertion,

$${\mu }_{H}\left(P,T,{N}_{H}\right)={\mu }_{H}\left(V,T,{N}_{H}\right)-\frac{V}{2{K}_{T}}\Delta {P}^{2}$$
(4)

where KT is isothermal bulk modulus, and is 1375 GPa for solid Fe and 1318 GPa for liquid Fe11; The change of pressure ΔP is estimated as \(-\frac{\Delta V}{V}{K}_{T}\), with \(\Delta V\) being the volume of an H atom. The correction term to obtain \({\mu }_{H}\left(P,T,{N}_{H}\right)\) is estimated to be about 0.03 eV/atom. Note that the unit eV/atom for \({\mu }_{H}\) indicates the free energy difference of the entire supercell divided by the number of H atoms inserted during particle insertion, which is always one in this study. Equating the chemical potentials of H between liquid and solid Fe-alloys at 330 GPa and 6200 K, we can obtain the partition coefficient of H between the outer core and the inner core as \({D}_{H}^{{sol}/{liq}}={C}_{H}^{{sol}}/{C}_{H}^{{liq}}\), where \({C}_{H}^{{sol}}\) and \({C}_{H}^{{liq}}\) are the concentrations (in wt%) of H in the equilibrating liquid and solid Fe-alloys, respectively.

First-principles molecular dynamics

We performed first-principles molecular dynamics (FPMD) simulations to evaluate internal energies and chemical potentials of both liquid and solid Fe-H alloys, using the VASP package58,59 with the projector-augmented-wave (PAW) implementation59,60 for density functional theory (DFT) calculations. The generalized gradient approximation (GGA) with the Perdew–Burke–Ernzerhof (PBE) pseudopotential61 was used to calculate the exchange-correlation interactions, with valence configurations of 3p63d74s1, 1s1, 3s23p4, and 3s23p2 for Fe, H, S, and Si, respectively. The plane-wave energy cutoff was set to be 500 eV and the gamma point was used for Brillouin zone sampling. The molecular dynamics simulations were performed in the canonical (NVT) ensemble with fixed number of atoms, volume, and temperature. The Fermi-Dirac statistics was used to populate single-particle orbitals, and the electronic temperature is equal to the macroscopic temperature, which was kept constant using the Nosé-Hoover thermostat62.

We performed additional test calculations to evaluate the effects of a higher energy cutoff and a larger supercell, respectively. We found that an energy cutoff of 800 eV would decrease the average potential energy change (ΔU) due to particle insertion, and in turn the chemical potential, by 0.010(2) eV/atom for both solid and liquid Fe-H systems (Supplementary Notes S2). In addition, a supercell of ~130 atoms (Fe128H2 solid) would decrease the chemical potential by 0.05 eV/atom (Supplementary Notes S3). We therefore made the correction of −0.06 eV/atom to all chemical potentials we obtained in this study (Supplementary Table S1). This correction is comparable to the uncertainties caused by the neural network model.

For a liquid, a 4 × 4 × 2 Hcp supercell (64 atoms) was strained to a cube to serve as a starting structure. The structure was first melted and equilibrated at 20,000 K for a period of 2.5 ps, then quenched to the desired temperature for 2.5 ps. At the desired temperature, the system was run in the NVT ensemble for more than 12 ps to achieve equilibration, with a time step of 1 fs for the pure Fe system and 0.3–0.5 fs for Fe-H alloys.

For a solid, the starting structure was constructed by expanding the 4-atom C-centered unit cell of the hcp structure to 4 × 2 × 2 (64 Fe atoms) to reach the hydrostatic state63. Because of their small size, H atoms were added at interstitial sites instead of substitutional sites in the hcp structure. The supercell was directly equilibrated at the desired temperature for about 12 ps in the NVT-ensemble, with similar time steps as in the simulations for liquids.

Neural network model

Atomistic machine learning can establish a relationship between the atomic structure of a system and its properties26. The potential energy change (ΔU) depends on the position relationship between the inserted particle and its local atomic configurations. In our model, we used the Smooth Overlap of Atomic Position package to encode the local atomic structure for each inserted particle and its surrounding atoms as a descriptor26,64, which is a one-dimensional array of 972 parameters (detailed parameters are shown in Supplementary Notes S4). These descriptors serve as the input data for the neural network model26,56. The target property is the potential energy change (ΔU) during an insertion, which is obtained by DFT calculations.

Artificial neural networks were used to construct a prediction model using the Matlab neural network toolbox. Two-layer feed-forward network models with 972 input parameters, 32 neurons in the hidden layer, and 1 output (972-32-1) between the descriptors and the corresponding potential energy changes were trained. In total, about 80,000 data sets for solid (11 snapshots with >7200 insertion positions for each snapshot) and about 110,000 data for liquid (23 snapshots with >4700 insertion positions for each snapshot) were for the training and validation of the neural network model. Among these data, 70% were training data sets, 15% were validation sets, and 15% were test sets. We repeated the training procedure for each Fe-alloy composition and obtained a neural network model for that composition. The validity of the models was further verified by additional independent snapshots.

Our neural network models can predict ΔU for an insertion very well. As an example, we compared neural network model predicted ΔU with those computed from DFT in Supplementary Fig. S6 (training, validation, and test data) and Supplementary Fig. S7 (independent validation data) for Fe64 solid, and Fe32H32 liquids. Most predictions fall very close to the 1:1 line in Supplementary Fig. S6 and 7. For the low ΔU insertions (ΔU < 0) that contribute more to the spatial integration in Eq. (2), the 1\(\sigma\) uncertainty in ΔU is ~0.03 eV/atom for solid and ~0.08 eV/atom for liquid in Supplementary Fig. S6c, d. The calculated chemical potentials would have lower uncertainties than ΔU because of the canceling effect of summing \(\exp \left(-\Delta U/{k}_{B}T\right)\) for all possible insertions.

Bragg–Williams model

Unlike S, Si, O, and C, H is accommodated at interstitial sites of both liquid and solid Fe alloys because of its small size. This means H in an Fe-alloy can move around freely and behaves like a gas at low concentrations65. Such behavior can be modeled by the Bragg–Williams model, in which, the chemical potential of H in FeHx can be written as28

$${\mu }_{{{{{{\rm{H}}}}}}}=A+{kT}{{{{\mathrm{ln}}}}}\frac{x}{B-x}+\frac{{Cx}}{B}$$
(5)

Where A is the energy of H in the Fe host, B is the number of nearest interstitial sites per Fe atom, and C is the H-H pair interaction. The fitted values of A, B, and C using our calculation results are −0.84(3) eV/atom, 5.13 (1), and −3.33(1) eV/atom, respectively for \({\mu }_{H}\) in the Fe-H solid, and are −2.00(8) eV/atom, 2.05(1) and −0.42(1) eV/atom, respectively, for \({\mu }_{H}\) in the Fe-H liquid (Fig. 1b).

Ideal mixing model and simulated annealing algorithm

The ideal mixing model7 was used to calculate the density and \({V}_{P}\) of the outer core. In this model, we have

$$\rho =\frac{M}{V}=\frac{{\sum }_{i}{X}_{i}{M}_{i}}{{\sum }_{i}{X}_{i}{V}_{i}}$$
(6)
$$\frac{V}{K}={\sum }_{i}{X}_{i}\frac{{V}_{i}}{{K}_{i}}$$
(7)
$${V}_{P}=\sqrt{K/\rho }$$
(8)

where xi, Mi, and Vi are respectively the atomic fraction, atomic mass, and partial atomic volume of the i-th component (Fe, S, Si, C, O, or H) in the Fe-alloy; V, \(\rho\), K, and \({V}_{P}\) are the volume, density, adiabatic bulk modulus, and compressional wave velocity (bulk sound velocity) of the mixture, respectively. We used the values of Vi and Ki from previous FPMD calculations7. The density and VP calculated from Eqs. (6) and (8) were then corrected for the presence of 5 wt% Ni in the core by increasing the density by 0.18% and decreasing VP by 0.19%7.

We applied the simulated annealing algorithm to search for the best combination of light elements (Si, O, S, C, and H) that satisfy the chemical equilibrium between the inner core and the outer core and best reproduce the density and \({V}_{P}\) of the outer core and the density of the inner core. The misfit function for the minimization process is defined as

$$\Delta = \frac{1}{N-M}\left\{{\sum }_{i}{\left[{\left(\frac{{\rho }_{i}^{{Calc}}-{\rho }_{i}^{{PREM}}}{{\sigma }_{\rho ,i}}\right)}^{2} + {\left(\frac{{{V}_{P}}_{i}^{{Calc}}-{{V}_{P}}_{i}^{{PREM}}}{{\sigma }_{{V}_{P},i}}\right)}^{2}\right]}_{{OC}} \right. \\ + \left. {\left(\frac{{\rho }_{{IC}}^{{Calc}}-{\rho }_{{IC}}^{{PREM}}}{{\sigma }_{\rho ,{IC}}}\right)}^{2}\right\}$$
(9)

where ρi and VPi are respectively the density and VP of the outer core (OC) at a set of nine different pressures (135, 150, 175, 200, 225, 250, 275, 300, and 325 GPa); \({\rho }_{{IC}}\) is the density of the inner core. Superscripts “Calc” and “PREM” indicate calculated and PREM values, respectively. N-M is the total degree of freedom for the fitting, with N and M are the number of data points and parameters, respectively. The uncertainties of the density σρ,i and σρ,IC) and \({V}_{P}\)VP,i) are estimated to be ~0.58% and ~0.82%, respectively, which were obtained by propagating uncertainties from both FPMD calculations (0.3% for density and 0.8% for velocity) and the PREM model (0.5% for density and 0.2% for velocity).