Main

Earth’s rotation is not fixed1,2. Changes in the spin rate Ωo lead to changes in the length of the day. The movement of the spin axis, termed polar motion, causes the rotational poles—points where the spin axis intersects Earth’s surface—to wander. The polar motion is described by the movement of the pole position, with coordinates xp and yp defined positive towards the central Greenwich meridian and 90° W longitude, respectively. Figure 1a shows the polar motion time series provided by the International Earth Rotation and Reference Systems Service (IERS)3. The signal is dominated by the Chandler wobble (period of approximately 433 days)—a free rotational mode excited by a combination of atmospheric and oceanic processes4—and an annual wobble, driven by seasonal atmospheric forcing5. To unravel other intriguing polar motion signals at lower frequencies, we apply a low-pass filter on the IERS time series to remove the Chandler and annual wobbles and all other signals with periods shorter than 500 days (Methods). Figure 1b shows this filtered signal, which we refer to as the long-period polar motion. It consists of a secular trend of approximately 3 milliarcseconds (mas) per year in the direction of Hudson Bay and interannual and multidecadal fluctuations with amplitudes of about 20–40 mas (refs. 6,7,8) (Fig. 1c). The fluctuations include a quasi-periodic signal with a typical timescale of approximately 30 years, often referred to as the Markowitz wobble9. The driving mechanisms of the secular trend and long-period fluctuations over the entirety of the observational record are not fully understood and are the focus of our study.

Fig. 1: Polar motion observations.
figure 1

a, Polar motion (1900–2018) from the IERS C01 series in units of mas. b, The long-period polar motion (solid lines) and their uncertainties at the one standard deviation level (shaded envelope, that is, mean signal ± their uncertainties) after the removal of all periods shorter than 500 days. In both a and b, the reference (xp = yp = 0) is chosen as the mean position in the interval 2002–2018 (shaded grey area). c, Polar view of the long-period polar motion relative to 1900. xp is positive towards Greenwich (0°); yp is positive towards 90° W. An angular motion of 1 mas corresponds to a displacement of 3.09 cm at Earth’s surface.

The variations in Earth rotation are captured by Liouville’s equation1,2 for Earth’s angular momentum

$$\frac{\partial }{\partial t}\left({{{{\bf{h}}}}}(t)+I(t){{{\bf{\upomega }}}}(t)\right)+{{{\bf{\upomega }}}}(t)\times \left({{{{\bf{h}}}}}(t)+I(t){{{\bf{\upomega }}}}(t)\right)={{{{\bf{\uptau} }}}}(t)\,,$$
(1)

in which t is time, ω(t) = Ωo(xp, − yp, 1) the rotation vector, I(t) the inertia tensor, h(t) the angular momentum caused by motion with respect to the rotating reference frame tied to the mantle and τ(t) the external gravitational torque from Sun, Moon and other celestial bodies. By definition, τ(t) has no effect on the long-period polar motion because it is absorbed by the induced time variation of the celestial pole, that is, precession and nutation. The long-period polar motion is, therefore, a consequence of conservation of Earth’s angular momentum and results from mass and angular momentum exchange between and within the Earth system components. Atmospheric winds, oceanic currents and core flows carry angular momentum and may induce a polar motion through h(t). Processes that redistribute mass on Earth’s surface or in the interior, such as ice melting, groundwater depletion, sea-level rise and solid Earth deformation, induce a polar motion through a perturbation of the inertia tensor I(t).

Deglaciation of the Northern Hemispheric ice sheets since the last glacial maximum10 and the induced response of the viscoelastic solid Earth towards isostasy, a process called glacial isostatic adjustment (GIA), is often invoked as the main mechanism responsible for the observed secular trend11,12,13,14,15. This provides a constraint on the profile of mantle viscosity and deglaciation history on which the direction and rate of the trend depends11,12,14, and the observed polar motion has thus given us key insights into the deglaciation history and physics of GIA processes. However, other processes can also contribute considerably to the secular trend. These include mantle convection (MC)15,16,17 and climate-change-induced land–ocean mass exchanges, in particular the melting of mountain glaciers and polar ice sheets, depletion of the terrestrial water storage (TWS) and associated rise of sea levels15,18. Coseismic and interseismic deformations have also been shown to contribute to a small part of the trend19,20,21.

The cause of the multidecadal fluctuations is more ambiguous. High-fidelity satellite gravimetry data spanning the past two decades have enabled reliable quantitative monitoring of global-scale surface ice/water mass redistribution. The TWS variability can explain the interannual polar motion in the past three decades18,22,23. In this way, polar motion fluctuations provide an additional way to monitor the surface mass transport driven by climate variations. However, the lack of credible reconstruction of TWS variability in the 20th century prevents a convincing demonstration that this is also the cause of earlier multidecadal fluctuations. Decadal variations in core flow produce time-dependent topographic and electromagnetic torques at the core–mantle boundary (CMB) that can generate polar motion fluctuations of a few mas (refs. 24,25,26), as can variations in the tilt of the oblate inner core figure27,28. However, the degree to which the core contributes to the observed fluctuations in polar motion is difficult to quantify.

As the above discussion illustrates, the observed long-period polar motion reflects the integrated effect of a multitude of processes acting at Earth’s surface and within its interior. A better understanding of this signal offers then a unique opportunity to constrain and advance our knowledge of these processes. The majority of previous studies have focused on individual geophysical processes or covered periods much shorter than the full 120-year-long observational record. A comprehensive polar motion reconstruction using forward models is difficult owing to the incomplete understanding of underlying physical processes and insufficient data to constrain the model parameters. In addition, such an approach overlooks possible (nonlinear) feedback between processes and may not reconstruct the observed polar motion adequately. Here we present an approach in which all the known contributions to polar motion are treated jointly by a machine learning algorithm based on neural networks. More specifically, we use physics-informed neural networks (PINNs29,30), constrained to satisfy geophysical models and data associated with individual processes. PINNs have proven effective at uncovering connections between processes and recovering the underlying physics even when imperfect physical models of these processes are used31.

PINNs architecture for training and prediction

The PINNs used in this study are a series of individual yet interacting neural networks that are trained to fit a set of observations while also capturing underlying geophysical processes. We use a total of 16 different neural networks denoted by Mi, i = 1, …, 16. M1 and M2 are the neural networks that learn the observed long-period polar motion xp and yp, respectively, shown in Fig. 1b. Several geophysical processes that are responsible for driving the observed polar motion are represented via Mi, i = 3, …16. We use a six-layer perceptron32 with 32 neurons each for the individual neural networks. We employ tangent hyperbolic activation functions except for the last layer, for which we use a linear function. Methods provide a detailed description of the PINNs architecture.

Figure 2 summarizes the architecture of the PINNs and the physical processes represented by the individual neural networks. We categorize the geophysical processes into four separate groups. We collectively refer to all processes related to Earth’s surface mass redistribution as barystatic processes33. These include glaciers and ice sheet mass balance, TWS variability and associated changes in sea levels. Two neural networks (M3, M4) are assigned to capture the barystatic mass redistribution. GIA and MC are long-term geophysical processes that contribute to the secular polar motion and are modelled together in a single framework. They require four networks (M5M8), each connected to changes in the Earth’s moment of inertia tensor. Core dynamics processes denote all contributions to polar motion connected to the core, including core–mantle interactions. They involve six networks: two (M9, M10) for the equatorial components of the time-dependent torque and pressure applied by the fluid outer core at the CMB; two (M11, M12) for the torque applied at the inner core boundary (ICB); and two (M13, M14) to track the equatorial components of the inner core tilt induced by the ICB torque. The latter also depend on the period and quality factor of the inner core wobble, which are left as parameters to be determined by PINNs (Methods). Finally, we consider seismic processes to refer to all contributions related to coseismic and interseismic deformations, which require two networks (M15, M16). A detailed description of these processes and their PINN representation is given in Methods.

Fig. 2: The method to model long-period polar motion.
figure 2

The geophysical processes are modelled by 16 different neural networks Mi that interact with each other. M1, M2 are used for learning the polar motion components xp, yp; M3, M4 for barystatic processes; M5, M6, M7, M8 for GIA and MC; M9, M10, M11, M12, M13, M14 for core dynamics processes; and M15, M16 for seismic processes. We use a six-layer perceptron (layers Li, i = 1, …, 6) with 32 neurons each (neurons \({N}_{\!j}^{i},\,i=1,\ldots ,6,\,j=1,\ldots ,32\)) as the architecture for each neural network (Methods). The orange circles denote the activation functions (tangent hyperbolic for the first five layers, linear for the last layer).

We train our PINNs to fit a set of observations for the period 1 January 1976 to 31 December 2018. The choice of this training period is motivated by the smaller measurement errors in more recent polar motion data2 and by the unavailability of accurate seismic data before 1976. We use the trained PINNs to predict the polar motion and uncover contributions from geophysical processes at earlier times, that is, 1900–1976. It is important to note that PINNs are blind to all geophysical and geodetic observations, including polar motion, outside the training period. The observed polar motion before 1976 is used only to compare against the predictions made by PINNs. The learnable parameters of the neural networks are initialized to random values and updated through iterative optimization. We use 100 separate realizations of the PINNs, each with a different set of randomized initial values. This allows us to capture the sensitivity of PINNs and reliably quantify the uncertainty in reconstructed polar motion.

Polar motion reconstruction and attribution

Figure 3 shows the comparison between the observed polar motion and that predicted by the PINNs. We show the results of three separate experiments. First, we consider a case where the neural networks are not informed by any geophysical process. With no imposed geophysical constraints, the prediction is a poor match of the observed signal. Next, we show the prediction when all the geophysical processes except core dynamics are included. Our motivation here is rooted in previous studies, which often overlook core processes15. The match to the observed polar motion is considerably improved, though some important differences remain for yp. Finally, we show our most comprehensive prediction when all geophysical processes are considered jointly, including the influence of core dynamics. For this optimal solution, the main features of observed polar motion are reconstructed within the observational uncertainties. The good fit to the data in the whole time interval, solely based on geodetic and geophysical information after 1976, suggests that we have adequately captured the underlying processes driving polar motion through their combined action. We note that the choice of training and prediction intervals does not fundamentally alter the ability of the PINNs to successfully reconstruct the polar motion (Supplementary Materials 7).

Fig. 3: Observed and predicted polar motion.
figure 3

a,b, Comparison between the observed polar motion xp (a) and yp (b) and the predictions (denoted by \({\hat{x}}_\mathrm{p},\,{\hat{y}}_\mathrm{p}\)) of the PINNs for three cases: no geophysical constraints (‘no processes’), all geophysical processes except core dynamics (‘no core dynamics’), full set of geophysical processes (‘all processes’). The training period (1976–2018) is indicated by the shaded blue area. The uncertainties (1σ) associated with the observed signal and each prediction are indicated by their shaded envelopes. c, The RMSE of the modelled vs observed polar motion series for different combinations of processes (B-SL: barystatic and sea-level; GIA-MC: GIA and mantle convection; CD: core dynamics; and EQ seismic). The red dashed line indicates the mean error of the observed polar motion. The RMSE when using no geophysical constraints is 134.53 mas.

To evaluate the importance of the individual geophysical processes, we carry out separate experiments for all 15 possible combinations of the investigated processes: individual (4), double (6), triplet (4) and all combined (1). We use the same training and prediction intervals. For each experiment, we compute an average root mean squared error (RMSE; Methods) based on the difference in xp and yp between the predicted and observed polar motion. A smaller RMSE implies a better reconstruction of the observations (Fig. 3c). Individually, the most important processes, in order, are barystatic, GIA and MC, core dynamics and seismic. All combinations that include barystatic processes provide a better prediction than when they are absent. Barystatic, GIA and MC together result in a better fit than when they are considered individually. Adding core dynamics to this combination noticeably reduces the RMSE, in line with the visual improvement of the fit shown in Fig. 3a,b. Our results imply that certain features of the polar motion are unequivocally the result of core dynamics. However, this happens only when core processes are considered alongside other processes; in isolation, the ability of core dynamics to predict polar motion is rather poor. Seismic processes contribute only to a small improvement in the fit, but their addition to any combination of processes invariably improves the prediction (Extended Data Fig. 6).

Figure 4 shows the source partitioning for our optimal solution. For each process, we build a forward model of its contribution to polar motion based on the solutions of the neural networks that model this process. The linear sum of these individual contributions provides an excellent fit to the observed signal. This indicates that to first order, the observed polar motion results from the linear superposition of the geophysical processes considered in our analysis. However, we note that the linear superposition does not provide a fit as good as that from the PINNs that model the polar motion directly via M1 and M2 with constraints from all geophysical processes. The difference is small (Table 1) but it nevertheless suggests possible nonlinear interactions between the processes.

Fig. 4: The individual contribution of each process to long-period polar motion.
figure 4

a,b, Comparison between the observed polar motion xp (a) and yp (b) and the individual contributions of the different processes (denoted by \({\hat{x}}_\mathrm{p},\,{\hat{y}}_\mathrm{p}\)) based on the solutions of their associated PINNs. The dotted lines correspond to the sum of the individual processes. All predicted signals are shown as their ensemble mean. The uncertainties (1σ) associated with the observed signal and each prediction are indicated by their shaded envelopes.

Table 1 Contributions to the secular trend in mas per year and R2 score, together with the uncertainty

Implications for constraining dynamical processes

The contributions of each geophysical process to the secular trend and long-period fluctuations for our optimal solution are summarized in Table 1. For the secular trend contributions, we perform least squares regressions on the reconstructed signals over the whole time interval 1900–2018. Since the linear sum (a secular trend of 2.82 mas per year in direction 71° W) is sufficiently close to the observed polar motion trend (2.65 mas per year in direction 72° W), the contributions from individual processes can be extracted with reasonable confidence. The largest fraction of the secular trend is caused by GIA and MC, which account for a rate of 2.69 mas per year towards 81.5° W. This trend is independent of the choice of plausible mantle viscosity profiles (Supplementary Materials 7). When PINNs are trained with specific MC-driven polar motion rates and directions, the relative contributions from GIA and MC vary, although their net sum does not change, and the PINNs solutions for other processes also remain unaltered. When we train our PINNs to match the mean rate and direction of polar motion predicted by an ensemble of mantle convection models15,16, we find that the secular trend from MC is 0.77 mas per year towards 93.20° W, whereas that from GIA is 1.93 mas per year towards 76.91° W (Extended Data Fig. 1), consistent with that predicted on the basis of the rate of geoid change (Supplementary Materials 7). The rate and direction from the combined effects of GIA and MC that we retrieve are robust, although their source partitioning depends on the accuracy of the mantle flow models.

Core dynamics contribute to a secular trend of 0.54 mas per year in the direction 32.8° W. The most likely cause of this drift is from a topographic torque caused by core flow acting on the CMB topography. Although it has been shown that such a torque drives a secular trend34, its contribution has been generally overlooked. The trend driven by this torque depends on the core flow geometry, the CMB topography and the mantle viscosity. Our results provide then a motivation for a renewed investigation of this core-driven drift and offer the prospect of further constraining dynamical processes acting at the CMB.

Our optimal solution features a relatively weak secular trend from barystatic processes (0.35 mas per year towards 76° E), substantially weaker than that predicted from forward models of surface mass redistribution and also pointing in a different direction15 (Extended Data Fig. 2 and Supplementary Materials 6). In these surface mass models, the trend is largely dominated by the melting of the Greenland Ice Sheet and global glaciers, in addition to groundwater depletion15. We argue that this points to an overestimation of the twentieth century Greenland ice loss, partly owing to the lack of precise timing of the maximum ice sheet extent during the Little Ice Age35,36,37. Furthermore, surface mass models probably underestimate contributions to the trend from other sources, particularly glaciers melting in the Himalayas and groundwater depletion in the Indian subcontinent23. A different trend recovered by our approach motivates a reassessment of the causes of historical sea-level rise38.

Although barystatic processes only contribute weakly to the secular trend, they explain approximately 90% of the observed interannual and multidecadal polar motion fluctuations (Table 1). This confirms that the surface mass redistribution, which accounts for the interannual polar motion changes over the past few decades18,22,23, also explains the multidecadal fluctuations observed throughout the twentieth century (Extended Data Figs. 45). Variations in TWS are the likely dominant driver of this signal, tied to multidecadal shifts in global wet/dry conditions driven by various climate indices39,40,41,42,43. The successful reproduction of the polar motion fluctuations during 1900–1975 solely based on information from 1976–2018 implies that these large-scale hydrological changes are not stochastic but are driven by natural quasi-periodic cycles in the climate system44,45. Our results emphasize how multidecadal polar motions provide a unique constraint to reconstructing/projecting global-scale water storage variability.

The remainder of the observed multidecadal polar motion is mainly caused by core dynamics (Fig. 4), through a time-varying torque at the CMB (Extended Data Fig. 3). Such a torque can readily induce the fluctuations of the order of 10 mas that we see in our solution24,25,26. This non-negligible multidecadal signal provides a useful constraint for the factors on which this torque depends, namely core flow, the CMB topography and the electrical conductivity of the lowermost mantle. As a further indication that our solution adequately captures core dynamics, we recover as part of our solution an inner core wobble period of 7.8 years, close to that suggested by theory (~7.5 years) (refs. 46,47) and observations (~8.5 years) (ref. 48).

The solution described above represents the optimal reconstruction of the polar motion by the PINNs given the imposed set of geophysical models and data. Different choices of models and data could alter the resulting solution and its geophysical/climatological interpretation. Our algorithm cannot recover stochastic events that occurred before 1976, such as sudden shifts in polar motion induced by large earthquakes19,20,21. Despite these caveats, the ability of our solution to correctly predict the polar motion solely based on observations after 1976 provides strong support for its fidelity and our interpretation of contributing processes.

Our optimal solution provides ample motivation for future research, and we have listed a few of those above. Adding to this list, we note that our solution also points to a systematic anticorrelation between core and barystatic signals (Fig. 4) suggestive of a feedback mechanism. We explored the strength of this feedback (Supplementary Materials 8) and find that a perturbation in barystatic or core excitation results in 6–8% change in polar motion driven by the other process. Whereas this simple experiment does not identify the nature of the coupling, it demonstrates a possible dynamic link between surface processes and core dynamics—an intriguing topic for future exploration.

Methods

Long-period polar motion

We apply an iterative low-pass filter on the C01 time series of the IERS49,50 to derive long-period polar motion. First, we determine the power spectral density (PSD) as a function of periods. The two periods shorter than 500 days that have the highest PSD values, denoted by T1, T2, are used to fit a harmonic model to both xp and yp separately, as

$${x}_\mathrm{p}(t)={a}_{1}\cos \left(\frac{2\uppi t}{{T}_{1}}\right)+{b}_{1}\sin \left(\frac{2\uppi t}{{T}_{1}}\right)+{c}_{1}\cos \left(\frac{2\uppi t}{{T}_{2}}\right)+{d}_{1}\sin \left(\frac{2\uppi t}{{T}_{2}}\right)\,,$$
(2a)
$${y}_\mathrm{p}(t)={a}_{2}\cos \left(\frac{2\uppi t}{{T}_{1}}\right)+{b}_{2}\sin \left(\frac{2\uppi t}{{T}_{1}}\right)+{c}_{2}\cos \left(\frac{2\uppi t}{{T}_{2}}\right)+{d}_{2}\sin \left(\frac{2\uppi t}{{T}_{2}}\right)\,,$$
(2b)

where the coefficients a1, a2, b1, b2, c1, c2, d1, d2 are derived by a least squares fit. We then subtract the fitted models from xp and yp. We build a new PSD from this filtered signal, and the above procedure is repeated iteratively until the maximum PSD of the filtered signal is smaller than 104 mas2, a value sufficiently low that all signals with periods shorter than 500 d have been effectively removed. We further remove the beat period of the Chandler and annual periods by applying a six-year moving average filter. Finally, we subtract the mean in the interval 2002–2018 from the time series to be consistent with the available barystatic dataset38. The resulting long-period polar motion signal is shown in Fig. 1b.

Evaluation metrics

The difference between the observed polar motion (xp, yp) and the prediction \(({\hat{x}}_\mathrm{p},{\hat{y}}_\mathrm{p})\) by the neural networks (M1, M2) is reported in terms of a root mean squared error (RMSE) defined as

$$\,{{\mbox{RMSE}}}\,=\sqrt{\frac{1}{2{N}_\mathrm{p}}\sum_{j=1}^{{N}_\mathrm{p}}\left(| {x}_\mathrm{p}({t}_{\!j})-{\hat{x}}_\mathrm{p}({t}_{\!j}){| }^{2}+| {\;y}_\mathrm{p}({t}_{\!j})-{\hat{y}}_\mathrm{p}({t}_{\!j}){| }^{2}\right)}\,,$$
(3)

where Np = 1,520 is the number of discrete epochs tj in the prediction time interval (1900–1975).

The degree by which a single geophysical process, or a combination of processes, can explain the interannual and decadal polar motion variations (that is, a departure from its secular trend) is measured in terms of a coefficient of determination, or R2 score, defined as

$${R}^{2}\,{{\mathrm{score}}}\,=100\%\left(1-\frac{\sum_{\!j = 1}^{{N}_{t}}{\left(o({t}_{\!j})-p({t}_{\!j})\right)}^{2}}{\sum_{\!j = 1}^{{N}_{t}}{\left(o({t}_{\!j})-\bar{o}\right)}^{2}}\right),$$
(4)

where p(tj) is the reconstructed signal of a geophysical process, o(tj) is the observed polar motion and \(\bar{o}\) the mean of observed polar motion, all at time tj and after a secular trend has been removed from each of these signals. The R2 score is computed over both the prediction (1900–1975) and training (1976–2018, N = 860) period, so the total number of discretized time points is Nt = N + Np = 2,380. We compute the R2 score separately for the two polar motion components, so o(tj) represents either xp(tj) or yp(tj).

PINNs architecture

Through extensive analyses, we found that a six-layer perceptron32 with 32 hidden neurons in each layer is the optimal architecture, which we then use as the basis for all the neural networks Mi(t) shown in Fig. 2. Each of the models Mi, i = 1, …, 16 is defined as a function of the input variable time t as follows

$$\begin{array}{ll}{M}_{i}(t)={A}_{i,6}+{B}_{i,6}\sigma\big({A}_{i,5}+{B}_{i,5}\sigma\big({A}_{i,4}+{B}_{i,4}\sigma\big({A}_{i,3}+{B}_{i,3}\sigma\\\qquad\quad\;\;\big({A}_{i,2}+{B}_{i,2}\sigma \big({A}_{i,1}+{B}_{i,1}t\big)\big)\big)\big)\big)\,,\qquad t\in {\mathbb{R}}\,,\end{array}$$
(5a)
$$\sigma (x)=\frac{{e}^{\;x}-{e}^{-x}}{{e}^{x}+{e}^{-x}}\,,\qquad \forall x\in {\mathbb{R}}\,,$$
(5b)

where σ(x) is the tangent hyperbolic activation function and Ai,j, Bi,j, j = 1, …, 6 are the learnable parameters of the neural networks that are optimized in the training phase. We use the so-called LBFGS algorithm51 for optimization because in our case it proved to be the best among others such as Adam52. It is through these optimizations that the difference between polar motion data and geophysical constraints is minimized (according to a loss function). Further information on the PINNs architecture and computation is given in Supplementary Materials 1.

PINNs loss function

The PINNs are trained to satisfy a set of geophysical data and constraints. This is achieved by minimizing a set of loss functions that are defined as the mean squared error between a geophysical constraint and a neural network. The loss functions are defined for the long-period polar motion signal xp, yp and for the four categories of geophysical processes. The overall loss function is a sum of these individual loss functions which, conceptually, can be written as

$$\begin{array}{l}{{\mathrm{Loss}}}\,=\,{{\mathrm{Loss}}}\,\left({x}_{\mathrm{p}},\,{M}_{1}(t)\right)+\,{{\mathrm{Loss}}}\,\left(\;{y}_{\mathrm{p}},\,{M}_{2}(t)\right)+\,{{\mathrm{Loss}}}\,\left({{\mathrm{barystatic}}}\right)\\\qquad\quad+{{\mathrm{Loss}}}\,\left({{\mathrm{GIA}}\,{\mathrm{and}}\,{\mathrm{MC}}}\right)+\,{{\mathrm{Loss}}}\,\left({{\mathrm{core}}\,{\mathrm{dynamics}}}\right)+{{\mathrm{Loss}}}\,\left({{\mathrm{seismic}}}\right).\end{array}$$
(6)

The individual loss functions for xp and yp are given by

$${{\mathrm{Loss}}}\,\left({x}_\mathrm{p},\,{M}_{1}(t)\right)=\frac{1}{N}\sum_{j=1}^{N}{\left({x}_\mathrm{p}({t}_{\!j})-{M}_{1}({t}_{\!j})\right)}^{2},$$
(7a)
$$\,{{\mathrm{Loss}}}\,\left({y}_\mathrm{p},\,{M}_{2}(t)\right)=\frac{1}{N}\sum_{j=1}^{N}{\left({y}_\mathrm{p}({t}_{\!j})-{M}_{2}({t}_{\!j})\right)}^{2}\,,$$
(7b)

where N denotes the number of observations in the training phase. The loss functions associated with individual geophysical processes are defined below.

Modelling barystatic processes

Barystatic processes and the accompanying relative sea-level change33 result in significant mass redistribution in the land–ocean system. The polar motion P = xp − iyp (where \(i=\sqrt{-1}\)) that results from these processes satisfies the Liouville equation1,2,53

$$P+\frac{i}{{\sigma }_\mathrm{cw}}\frac{\mathrm{d}P}{\mathrm{d}t}=\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}(1+{k}_{2}^{{\prime} })\chi \,,$$
(8)

where k2 = 0.3055, ks = 0.942 and \({k}_{2}^{{\prime} }=-0.30\) are the degree 2 tidal, secular and load Love numbers, respectively, and σcw is the complex frequency of the Chandler wobble \(\scriptstyle{\sigma }_\mathrm{cw}=\frac{2\uppi }{{T}_\mathrm{cw}}\left(1+\frac{i}{2{Q}_\mathrm{cw}}\right)\) with period Tcw = 433 days and quality factor Qcw = 170. The excitation function χ = χ1 + iχ2 captures how barystatic processes drive polar motion. We build a time-dependent model of χ on the basis of a dataset38 covering the range 1900–2018, containing 100 climate models of the spatio-temporal pattern of mass change in the Antarctic ice sheet, Greenland ice sheet, glaciers and TWS. We follow the procedure detailed in refs. 54,18 and summarized in section 3 of Supplementary Materials. Forward model predictions of the polar motion driven by barystatic processes are shown in Extended Data Figs. 4 and 5 (and also Supplementary Figs. 9 and 10).

We train neural networks M3(t) and M4(t) to obey χ1(t) and χ2(t), respectively. We thus seek to minimize the difference between M3(t) and χ1(t) and between M4(t) and χ2(t) and minimizing each of the two components of equation (8), which are

$$p\frac{\mathrm{d}{x}_\mathrm{p}}{\mathrm{d}t}+q\frac{\mathrm{d}{y}_\mathrm{p}}{\mathrm{d}t}+{x}_\mathrm{p}=\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}(1+{k}_{2}^{{\prime} }){\chi }_{1}\,,$$
(9a)
$$q\frac{\mathrm{d}{x}_\mathrm{p}}{\mathrm{d}t}-p\frac{\mathrm{d}{y}_\mathrm{p}}{\mathrm{d}t}-{y}_\mathrm{p}=\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}(1+{k}_{2}^{{\prime} }){\chi }_{2}\,,$$
(9b)

where p = 0.2027, q = 68.9135. The loss function associated with barystatic processes is written as a sum of four individual terms which enter equation (6) as

$$\begin{array}{l}{{\mathrm{Loss}}}\,\left({{\mathrm{barystatic}}}\right)=\\ \frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left({\chi}_{1}({t}_{\!j})-{M}_{3}({t}_{\!j})\right)}^{2}+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left({\chi }_{2}({t}_{\!j})-{M}_{4}({t}_{\!j})\right)}^{2}\\ +\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left(p\frac{\mathrm{d}{M}_{1}({t}_{j})}{\mathrm{d}t}+q\frac{\mathrm{d}{M}_{2}({t}_{j})}{\mathrm{d}t}+{M}_{1}({t}_{\!j})-\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}(1+{k}_{2}^{{\prime}}){M}_{3}({t}_{\!j})\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left(q\frac{\mathrm{d}{M}_{1}({t}_{\!j})}{\mathrm{d}t}-p\frac{\mathrm{d}{M}_{2}({t}_{\!j})}{\mathrm{d}t}-{M}_{2}({t}_{\!j})-\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}(1+{k}_{2}^{{\prime}}){M}_{4}({t}_{\!j})\right)}^{2}.\end{array}$$
(10)

Modelling GIA and mantle convection processes

We model the polar motion produced by the combined effects of GIA and MC by the method presented in refs. 55,13 and detailed in section 2 of the Supplementary Materials. The two equatorial components of polar motion m1 = xp and m2 = − yp are governed by

$$\begin{array}{l}\frac{1}{{\sigma }_\mathrm{cr}}\frac{\mathrm{d}{m}_{1}}{\mathrm{d}t}+\left(1+\frac{{D}_{33}^{* }-{D}_{11}^{* }}{C-A}\right){m}_{1}(t)-\left(\frac{{D}_{12}^{* }}{C-A}\right){m}_{2}(t)=\frac{1}{C-A}\left(\delta (t)+{k}_{2}^{L}(t)\right)\\*\Delta {L}_{13}(t)+\frac{{k}_{2}^{T}(t)}{{k}_\mathrm{f}}*{m}_{1}(t)+\frac{{D}_{13}^{* }(t)}{C-A},\end{array}$$
(11a)
$$\begin{array}{l}-\frac{1}{{\sigma }_\mathrm{cr}}\frac{\mathrm{d}{m}_{2}}{\mathrm{d}t}+\left(1+\frac{{D}_{33}^{* }-{D}_{22}^{* }}{C-A}\right){m}_{2}(t)-\left(\frac{{D}_{12}^{* }}{C-A}\right){m}_{1}(t)=\frac{1}{C-A}\left(\delta (t)+{k}_{2}^{L}(t)\right)\\*\Delta {L}_{23}(t)+\frac{{k}_{2}^{T}(t)}{{k}_\mathrm{f}}*{m}_{2}(t)+\frac{{D}_{23}^{* }(t)}{C-A},\end{array}$$
(11b)

in which t is time, C and A are the polar and mean equatorial moments of inertia, σcr = Ωo(C − A)/A is the Chandler wobble of a rigid Earth, kf is the fluid Love number, * denotes the convolution operator and δ(t) is the Dirac delta function. Dij denote the non-hydrostatic perturbations in the moment of inertia tensor induced by mantle convection. Of these, D13(t) and D23(t) are time-dependent and capture the process of MC driving polar motion. ΔL13(t), ΔL23(t) capture the change in the moment of inertia tensor associated with the changing surface ice load. These variables drive the polar motion caused by GIA. \({k}_{2}^{T}(t)\) and \({k}_{2}^{L}(t)\) are the time-dependent tidal and load viscoealstic Love numbers, respectively. They are expanded as a set of relaxation modes and computed as in ref. 55 based on an assumed mantle viscosity model. In our default setting, the lithosphere (from surface to depth of 100 km) is elastic, and the viscosities of the upper mantle (depth of 100 km to 670 km), lower mantle (670 km to 2,591 km) and D’ layer (2,591 km to 2,891 km) are set to 2 × 1020 Pa s, 5 × 1021 Pa s and 5 × 1018 Pa s, respectively. Although the choice of mantle viscosity model affects the forward model prediction of the polar motion55, it does not affect our main results (Supplementary Materials 2). The values of all parameters used in equation (11) are given in Supplementary Table 2. The two equations are solved iteratively with the sea-level equation on a rotating Earth56 to derive the polar motion generated by GIA and MC.

We assign neural networks M5, M6 to learn the time history of ΔL13(t), ΔL23(t), which is built from the ICE-7G_NA palaeotopography dataset57 that contains the change in global ice thickness in the past 26,000 years. An example of a forward model prediction of the polar motion driven by GIA is shown in Supplementary Fig. 6 for the mantle viscosity model that we use in our default setting. Over the past 120 years, it is well approximated by a secular trend at a rate of 2.70 mas per year in the direction 81.5° W (towards Hudson Bay).

Neural networks M7, M8 are assigned to the time histories of the driving terms D13(t) and D23(t) from MC. Similar to GIA, the polar motion in the past 120 years driven by MC is well approximated by a linear trend, although the precise rate and direction of this trend are difficult to determine from observations15,16,17. In our default setting, M7 and M8 are left unconstrained and are determined freely by the PINNs solution. We have also carried out additional numerical experiments in which we constrain M7 and M8 to specific true polar wander rates and directions (Supplementary Materials 2).

Equation (11a,b) is more conveniently written in the Laplace transform domain58. The loss function of GIA and MC in PINNs is composed of four different individual terms and written as

$$\begin{array}{ll}{{\mathrm{Loss}}}\,\left({{\mathrm{GIA}}\,{\mathrm{and}}\, {\mathrm{MC}}}\right)=\\ \frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left(\Delta {L}_{13}({s}_{\!j})-{M}_{5}({s}_{\!j})\right)}^{2}+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left(\Delta {L}_{23}({s}_{\!j})-{M}_{6}({s}_{\!j})\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}\left(\frac{{s}_{\!j}}{{\sigma }_{\mathrm{cr}}}{M}_{1}({s}_{\!j})+\left(1+\frac{{D}_{33}^{* }-{D}_{11}^{* }}{C-A}\right){M}_{1}({s}_{\!j})-\left(\frac{{D}_{12}^{* }}{C-A}\right){M}_{2}({s}_{\!j})\right.\\\left.-\frac{1}{C-A}\left(\Delta {L}_{13}(0)+{k}_{2}^{L}({s}_{\!j}){M}_{5}({s}_{\!j})\right)-\right.{\left.\frac{{k}_{2}^{T}({s}_{\!j}){M}_{1}({s}_{\!j})}{{k}_{\mathrm{f}}}-\frac{{M}_{7}({s}_{\!j})}{C-A}\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}\left(-\frac{{s}_{\!j}}{{\sigma }_{\mathrm{cr}}}{M}_{2}({s}_{\!j})+\left(1+\frac{{D}_{33}^{* }-{D}_{22}^{* }}{C-A}\right){M}_{2}({s}_{\!j})-\left(\frac{{D}_{12}^{* }}{C-A}\right){M}_{1}({s}_{\!j})\right.\\\left.-\frac{1}{C-A}\big(\Delta {L}_{23}(0)+{k}_{2}^{L}({s}_{\!j}){M}_{6}({s}_{\!j})\big)-\right.{\left.\frac{{k}_{2}^{T}({s}_{\!j}){M}_{2}({s}_{\!j})}{{k}_\mathrm{f}}-\frac{{M}_{8}({s}_{\!j})}{C-A}\right)}^{2},\end{array}$$
(12)

where sj are N discretized values of the Laplace variable s and ΔL13(0), ΔL23(0) are the initial values of ΔL13(t), ΔL23(t) 26,000 years ago.

Modelling core dynamics processes

We use a modified version of the model presented in ref. 27 to capture the effects of core dynamics on polar motion (Supplementary Materials 4). We compute the polar motion P = xp − iyp that results from a torque at the CMB \({\tilde{\Gamma }}_{m}={\Gamma }_{m1}+i{\Gamma }_{m2}\). We also compute the tilt of the inner core figure \({\tilde{n}}_\mathrm{s}={x}_\mathrm{s}-i{y}_\mathrm{s}\) generated by a torque at the ICB \({\tilde{\Gamma }}_\mathrm{s}={\Gamma }_\mathrm{s1}+i{\Gamma }_\mathrm{s2}\) and the polar motion that this tilt causes as a result of gravitational coupling with the mantle. The equations for \(\tilde{m}\) and \({\tilde{n}}_\mathrm{s}\) are

$$\frac{\mathrm{d}P}{\mathrm{d}t}=i{\sigma }_\mathrm{cw}P+i{\Omega }_\mathrm{o}\frac{{A}_\mathrm{s}}{{A}_\mathrm{m}}{e}_\mathrm{s}{\alpha }_{3}({\alpha }_\mathrm{g}-{\kappa }_\mathrm{s}){\tilde{n}}_\mathrm{s}+\frac{{\tilde{\Gamma }}_\mathrm{m}}{{\Omega }_\mathrm{o}{A}_\mathrm{m}}\,,$$
(13a)
$$\frac{\mathrm{d}{\tilde{n}}_\mathrm{s}}{\mathrm{d}t}=i{\sigma }_\mathrm{s}{\tilde{n}}_\mathrm{s}-\frac{{\tilde{\Gamma }}_\mathrm{s}}{{\Omega }_\mathrm{o}{A}_\mathrm{s}}\,,$$
(13b)

where Am and As are the mean equatorial moments of inertia of the mantle and inner core, respectively, es is the dynamical ellipticity of the inner core, α3, αg and κs are gravitational coupling parameters. Numerical values for these parameters are given in Supplementary Table 3. σcw denotes (as in equation (8)) the complex frequency of the Chandler wobble, and \(\scriptstyle{\sigma }_\mathrm{icw}=\frac{2\uppi }{{T}_\mathrm{icw}}\left(1+\frac{i}{2{Q}_\mathrm{icw}}\right)\) denotes that of the inner core wobble (ICW) with period Ticw and quality factor Qicw. The equations for the individual components xp, yp and xs, ys are written as

$$p\frac{\mathrm{d}{x}_\mathrm{p}}{\mathrm{d}t}+q\frac{\mathrm{d}{y}_\mathrm{p}}{\mathrm{d}t}+{x}_\mathrm{p}=-\xi q{x}_\mathrm{s}+\xi p{y}_\mathrm{s}+{\chi }_{1}^{\;\mathrm{cmb}},$$
(14a)
$$q\frac{\mathrm{d}{x}_\mathrm{p}}{\mathrm{d}t}-p\frac{\mathrm{d}{y}_\mathrm{p}}{\mathrm{d}t}-{y}_\mathrm{p}=\xi p{x}_\mathrm{s}+\xi q{y}_\mathrm{s}+{\chi }_{2}^{\;\mathrm{cmb}},$$
(14b)
$${p}_\mathrm{s}\frac{\mathrm{d}{x}_\mathrm{s}}{\mathrm{d}t}+{q}_\mathrm{s}\frac{\mathrm{d}{y}_\mathrm{s}}{\mathrm{d}t}+{x}_\mathrm{s}={\chi }_{1}^{\;\mathrm{icb}},$$
(14c)
$${q}_\mathrm{s}\frac{\mathrm{d}{x}_\mathrm{s}}{\mathrm{d}t}-{p}_\mathrm{s}\frac{\mathrm{d}{y}_\mathrm{s}}{\mathrm{d}t}-{y}_\mathrm{s}={\chi }_{2}^{\;\mathrm{icb}},$$
(14d)

where p = 0.2027, q = 68.9135, ξ = 1.0513 × 10−6 and

$${p}_\mathrm{s}=\frac{{T}_\mathrm{icw}}{2\uppi }\frac{1}{1+\frac{1}{4{Q}_\mathrm{icw}^{2}}}\frac{1}{2{Q}_\mathrm{icw}}\,,\quad \quad \quad {q}_\mathrm{s}=\frac{{T}_\mathrm{icw}}{2\uppi }\frac{1}{1+\frac{1}{4{Q}_\mathrm{icw}^{2}}}.$$
(15)

\({\chi }_{1}^\mathrm{cmb},\,{\chi }_{2}^\mathrm{cmb}\) and \({\chi }_{1}^\mathrm{icb},\,{\chi }_{2}^\mathrm{icb}\) are the excitation functions at the CMB and ICB, respectively. These excitation functions contain the torques at the CMB and ICB. \({\chi }_{1,2}^\mathrm{cmb}\) (\({\chi }_{1,2}^\mathrm{icb}\)) further capture all processes that induce polar motion (an inner core tilt) through a change in the core (inner core) inertia tensor.

We assign neural networks M9, M10 to the components of inner core tilt xs, ys and neural networks M11, M12, M13, M14 to \({\chi }_{1}^\mathrm{cmb},\,{\chi }_{2}^\mathrm{cmb},\,{\chi }_{1}^\mathrm{icb},\,{\chi }_{2}^\mathrm{icb}\), respectively. None of these quantities are known from observations, so M9M14 are left unconstrained; in other words, these PINNs are not trained on a specific dataset other than polar motion. However, they must minimize the loss function of the core dynamics processes having four terms, one for each of equation (14a–d). The loss function for core dynamics processes is written as

$$\begin{array}{l}{{\mathrm{Loss}}}\,\left({{\mathrm{core}} \,{\mathrm{dynamics}}}\right)=\\ \frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left(p\frac{\mathrm{d}{M}_{1}({t}_{\!j})}{\mathrm{d}t}+q\frac{\mathrm{d}{M}_{2}({t}_{\!j})}{\mathrm{d}t}+{M}_{1}({t}_{\!j})+\xi q{M}_{9}({t}_{\!j})-\xi p{M}_{10}({t}_{\!j})-{M}_{11}({t}_{\!j})\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left(q\frac{\mathrm{d}{M}_{1}({t}_{\!j})}{\mathrm{d}t}-p\frac{\mathrm{d}{M}_{2}({t}_{\!j})}{\mathrm{d}t}-{M}_{2}({t}_{\!j})-\xi p{M}_{9}({t}_{\!j})-\xi q{M}_{10}({t}_{\!j})-{M}_{12}({t}_{\!j})\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left({p}_\mathrm{s}\frac{\mathrm{d}{M}_{9}({t}_{\!j})}{\mathrm{d}t}+{q}_{s}\frac{\mathrm{d}{M}_{10}({t}_{\!j})}{\mathrm{d}t}+{M}_{9}({t}_{\!j})-{M}_{13}({t}_{\!j})\right)}^{2}+\\ \frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left({q}_\mathrm{s}\frac{\mathrm{d}{M}_{9}({t}_{\!j})}{\mathrm{d}t}-{p}_{\mathrm{s}}\frac{\mathrm{d}{M}_{10}({t}_{\!j})}{\mathrm{d}t}-{M}_{10}({t}_{\!j})-{M}_{14}({t}_{\!j})\right)}^{2}.\end{array}$$
(16)

The period Ticw and quality factor Qicw of the ICW are two additional, a priori unknown parameters, which are initialized randomly and subsequently optimized via our PINNs.

Modelling seismic processes

Individual earthquakes induce a sudden change in the moment of inertia tensor, resulting in a sudden shift (kink) in polar motion19,20,21,59. Their associated excitation functions can be modelled as a series of step functions, and their cumulative effect results in a secular polar motion drift. In addition to coseismic deformations, interseismic deformations also contribute to polar motion but are more difficult to model19,20. We compute the excitation functions \({\chi }_{1}^\mathrm{eq}\) and \({\chi }_{2}^\mathrm{eq}\) associated with seismic deformations on the basis of the dislocation theory60 and using the fault geometry and moment tensors of individual earthquakes provided by the Centroid Moment Tensor catalogue61,62 from 1976 onward. Details on the method are presented in Supplementary Materials 5. The resulting excitation functions \({\chi }_{1}^\mathrm{eq}\) and \({\chi }_{2}^\mathrm{eq}\) are shown in Extended Data Fig. 6 (also Supplementary Fig. 12). We train neural networks M15 and M16 to learn the excitation functions \({\chi }_{1}^\mathrm{eq}\) and \({\chi }_{2}^\mathrm{eq}\). These neural networks are meant to capture the slow, gradual trend in the excitation functions from the cumulative effect of earthquakes, not the sudden excitation induced by individual events. \({\chi }_{1}^\mathrm{eq}\) and \({\chi }_{2}^\mathrm{eq}\) generate polar motion through the Liouville equation. Therefore, we define the loss function for seismic processes as

$$\begin{array}{l}{{\mathrm{Loss}}}\,\left({{\mathrm{seismic}}}\right)=\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left({\chi }_{1}^\mathrm{eq}({t}_{\!j})-{M}_{15}({t}_{\!j})\right)}^{2}+\frac{1}{N}\mathop{\sum}\limits_{j=1}^{N}{\left({\chi }_{2}^\mathrm{eq}({t}_{\!j})-{M}_{16}({t}_{\!j})\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{k=1}^{N}{\left(p\frac{\mathrm{d}{M}_{1}({t}_{\!j})}{\mathrm{d}t}+q\frac{\mathrm{d}{M}_{2}({t}_{\!j})}{\mathrm{d}t}+{M}_{1}({t}_{\!j})-\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}\big(1+{k}_{2}^{{\prime} }\big){M}_{15}({t}_{\!j})\right)}^{2}\\+\frac{1}{N}\mathop{\sum}\limits_{k=1}^{N}{\left(q\frac{\mathrm{d}{M}_{1}({t}_{\!j})}{\mathrm{d}t}-p\frac{\mathrm{d}{M}_{2}({t}_{\!j})}{\mathrm{d}t}-{M}_{2}({t}_{\!j})-\frac{{k}_\mathrm{s}}{{k}_\mathrm{s}-{k}_{2}}\big(1+{k}_{2}^{{\prime} }\big){M}_{16}({t}_{\!j})\right)}^{2}.\end{array}$$
(17)

where p = 0.2027, q = 68.9135.