Abstract
Currently, causes of the middle Pleistocene transition (MPT) – the onset of largeamplitude glacial variability with 100 kyr time scale instead of regular 41 kyr cycles before – are a challenging puzzle in Paleoclimatology. Here we show how a Bayesian data analysis based on machine learning approaches can help to reveal the main mechanisms underlying the Pleistocene variability, which most likely explain proxy records and can be used for testing existing theories. We construct a Bayesian datadriven model from benthic δ^{18}O records (LR04 stack) accounting for the main factors which may potentially impact climate of the Pleistocene: internal climate dynamics, gradual trends, variations of insolation, and millennial variability. In contrast to some theories, we uncover that under longterm trends in climate, the strong glacial cycles have appeared due to internal nonlinear oscillations induced by millennial noise. We find that while the orbital Milankovitch forcing does not matter for the MPT onset, the obliquity oscillation phaselocks the climate cycles through the meridional gradient of insolation.
Introduction
The pronounced change in the glacialinterglacial regime that occurred about 1 million years ago – the socalled Middle Pleistocene transition (MPT) – is widely regarded as an apparent manifestation of climate system’s nonlinearity. The MPT is observed in various proxy records as a shift in glaciation periodicity (from 41 kyr to approximately 100 kyr) accompanied by both an increase of the ice/temperature oscillation amplitude and a change of the characteristic shape of the oscillations from almost symmetrical to a sawtooth shape with gradual coolings and rapid deglaciations (see Fig. 1(A,B) and refs^{1,2,3}). The most significant external forcing of climate – the insolation variations affected by the Earth’s orbital parameters (the Milankovitch forcing) – remained unchanged during the Pleistocene. Hence, it is clear that the MPT is closely connected with the internal properties of climate and their possible response to largescale changes of the environment. Currently, there are ongoing discussions concerning the mechanisms of the MPT and the roles of different orbital parameters and the natural climate variability in it. The problem is that the climate is a complex highdimensional system with various nonlinear feedbacks; therefore, to identify the mechanisms one should distinguish the most important subsystems driving such changes. The latter is indeed a very problematic task due to difficulties with the verification of different models. Existing theories of the Pleistocene dynamics regard various internal factors for an explanation of MPT causes, such as the icealbedo, precipitationtemperature and sea level feedbacks, atmospheric and ocean circulation, CO_{2} cycle, dust accumulation, etc. (cf. refs^{2,4}). Many dynamical mechanisms of glacial cycles have been suggested based on different conceptual models derived from simplified physical considerations. In particular, they include relaxation oscillations arising under longterm trends in parameters (see the review^{5} of the corresponding models), nonlinear resonance to the orbital forcing^{6}, noise and forcinginduced transitions between multiple steady states^{7}, chaotic response to the insolation forcing^{8}, stochastic resonance^{9}, etc. Several of the suggested models provide a good fit to the observed proxy records, but the ability of a model to reproduce data is certainly not sufficient for the verification of a theory. Tziperman et al.^{10} argue that the nonlinear phaselocking mechanism common for many nonlinear dynamical models can easily provide the correct output through synchronization of a model with the Milankovitch forcing, but the physical mechanism put in the model does not have to be necessarily correct.
To overcome this challenging puzzle, we here intend to explore the Pleistocene glacial cycles by Bayesian data analysis revealing the model that is minimal but sufficient for describing data. Mathematically, such a model provides the highest probability to produce the proxy records we have, and hence, yields statistically justified inferences. The advantages of the Bayesian methodology in selecting the proper scenario underlying the paleoclimate observations were discussed, e.g. in ref.^{5}. Here we show how the datadriven model of the Pleistocene dynamics obtained from the Bayesian principles can be used for supporting or rejecting existing climatological theories.
We infer that nonlinear feedbacks in the climate system are principal factors for the MPT, whereas external forcing – the gradient of insolation – only paces the major deglaciations in the postMPT climate. Thus, our objective analysis supports those theories bringing internal climate variability to the forefront, while those regarding the orbital oscillations as a main driver of the 100 kyr glacialinterglacial cycles are essentially rejected.
Results
First, we describe the data we use and the dynamical model form we suggest. After that, we show how the model learned captures the main properties of the Pleistocene dynamics. Then we use the model for analyzing influences of different factors such as trends and forcings, as well as for studying the mechanism underlying the observed response. Finally, we present a prediction of climate cycles made by the model.
Data and datadriven model
For the purpose stated above we restrict our consideration by the widely used LR04 stack of benthic δ^{18}O records^{11}. This stack accumulates data from 57 sites scattered over the globe and reflects the global average of total climate changes. We took only the last part of the time series beginning form 2.6 Ma when the glacialinterglacial cycles became regular^{2}. Additionally, for technical reasons, we made this time series uniformly sampled with 2.5 kyr time step by means of applying the 5 kyr sliding window (see Methods). The latter smoothing does not disturb much the structure of the cycles, while just slightly decreases shortterm noise in the data. The time series used for analysis together with its wavelet transform are shown in Fig. 1(A,B).
The datadriven model is constructed in the form of stochastic discrete dynamical system following the works refs^{12,13,14}.

(i)
The first term here is the deterministic evolution operator (dynamical system) mapping some history of climate’s states of duration L (the model dimension) to the next state. This term describes the internal dynamical properties of the system. The second term parameterizes the stochastic forcing of the model, which is needed to account for processes with time scales under the time resolution of our data, e.g. the millennial and centennial dynamics. Such a “noise” was shown to play a crucial role in the ice ages (see e.g. refs^{5,7,9}), so the random perturbations are expected to be an essential part of the model. In the suggested model form such a noise can be statedependent due to the product of the function g and uncorrelated Gaussian noise ζ. Both the functions f and g are unknown a priori and found by means of Bayesian machine learning techniques. In this work we define them via universal approximators (see Methods for details), that makes the form of the model Eq. 1 quite general and able to describe a wide class of dynamical systems.

(ii)
Next, it was proposed by a number of theories of the MPT that the longterm Cenozoic cooling – e.g. a secular decrease in atmospheric pCO_{2}^{15}, global mean temperature^{16}, deep water temperature^{17} – have brought the climate system to some critical transition after which the nonlinear oscillations of ice sheets became feasible. To reflect possible changes of this type, we make the deterministic part f of the model depend explicitly on time t_{n} by some modification of standard functions used for its approximation (see Methods). Such a timedependence allows us to study a slow evolution of the climate system in time and thus to reveal dynamical mechanisms underlying the observed transitions. Moreover, it gives us an opportunity to extrapolate the model beyond the observations and predict the dynamical regime over some time interval in future^{12,13,14}.

(iii)
The last factor Eq. 1 depends on is an orbital forcing q – the Earth’s insolation variations. This signal is affected primarily by three astronomical parameters: precession of both the Earth’s axis and orbit yielding together 19 and 23 kyr spectral peaks, obliquity oscillations with the 41 kyr dominating period, and lesspowerful variations of the Earth’s orbit eccentricity contributing to time scales around 100 kyrs (see power spectra of the insolation in Fig. S3). Although the insolation signals have the same spectral components over the globe, the relationships between different harmonics are latitudedependent: in particular, the obliquity peaks being strong far from the equator vanish in the tropics. We take into account such a dependence by using a twodimensional forcing q taken from the dataset described in ref.^{18} consisting of July insolation time series at the tropical (15°N) and subpolar (65°N) latitudes.
The complexity of the model Eq. 1 is determined by a set of the structural parameters – the dimension L and the numbers of nonlinear elements in the approximators of both functions f and g parameterizing the deterministic and stochastic parts respectively (see Methods and Supplementary Information (SI)). For the selection of the optimal model’s structure, we use Bayesian methodology: the optimal model is the one that maximizes the probability of generating the time series in hand; see the section Methods for details and practical implementation of the approach.
Regarding the LR04 dataset analysis presented here, the model Eq. 1 was trained with different sets of structural parameters. Then the different models were compared in accordance with the Bayesian criterion (see methods and SI) and the top10 best models were considered further. All the models considered demonstrate qualitatively very similar results as well as the same mechanism of the MPT; therefore, hereinafter we illustrate the results using the best model only.
Dynamics of the model
The model Eq. 1 is the stochastic dynamical system, hence its output depends on the random variable ζ and can differ from run to run. Figure 1(C) shows the wavelet transform (WT) of the model averaged over an ensemble of 10,000 model runs with different noise realizations, as compared to the WT of the single LR04 data (Fig. 1(B)). Due to averaging effects, WT of the model looks more “homogeneous” than that of the data (and that of individual model time series; see an example in Fig. S1) and allows a clear identification of the MPT onset in the model behavior. It is seen that the model reproduce the 41 kyr spectral line during the entire Pleistocene associated with the linear response to the obliquity signal as well as the moving of the substantial part of the spectral power to lower frequencies that culminates in the interval from 1.3 to 1 Ma – the main manifestation of the MPT. Beyond that, the model reproduces correctly the changing characteristic shape of the climate cycles, as shown in Fig. 2(A,B): close to sinusoidal symmetric oscillations during the preMPT epoch and sawtooth longperiod and largeamplitude motions with the rapid deglaciations (δ^{18}O decreases) after the MPT.
The impact of stochastic forcing
To study the role of the random fluctuations (noise) in the model, which represent the subgrid millennial and centennial climate variability, we compare outputs of the model in three variants: (i) the full model containing all the forcings, (ii) the model without the insolation forcing, in which both forcing time series entering q are set to their mean values, and (iii) the noisefree model with g = 0. We find that the stochastic forcing is a crucial factor for the frequencyband change associated with the MPT: the transition to the 100 kyr scale occurs in the same way in the model without any insolation signal, whereas the model with insolation forcing only and no noise only exhibits a response to the obliquity 41 kyr oscillations through the entire Pleistocene (Fig. 1(D,E)). Thus, the model internal dynamical properties, shortscale climate fluctuations and the model nonstationarity induce together the frequency shift in the δ^{18}O variations; by contrast, the variability in insolation is not instrumental in the MPT.
The resulting probability distribution of noise in the model Eq. 1 is not constant in time due to the statedependence of the noise amplitude g. The ratio of the “instantaneous” noise variance g^{2} to the variance of the LR04 time series is marked by color in Fig. 3(A), where the time series from a single randomly chosen model run is plotted. The increase in the millennial variability in colder climate seen in Fig. 3(A) coincides well with findings of refs^{19,20} showing an amplification of the millennialscale climate variability amplitude with an increasing of the continental ice mass. As shown below, such an increase in the subgrid fluctuations may contribute to but does not play a central role in the MPT mechanism.
The impact of insolation forcing
Although the insolation signals q at the input of the model Eq. 1 are comprised of two distinct time series, we found that the optimal models only depend on a particular combination of these factors that is close to the difference between the tropical and subpolar insolation inputs (see SI). Thus, in accordance with our analysis, the only insolation forcing that matters in the Pleistocene climate is the meridional gradient of the insolation mainly affected by the obliquity oscillations (see Fig. S3). All the harmonics related to the precession and eccentricity forcings nearly disappeared in the gradient and thereby cannot contribute to the observed dynamics. The insolation gradient was regarded as a main driver for the 41 kyr climate cycles in the preMPT epoch (from 3 Ma to 0.8 Ma), through its modulation of the atmospheric meridional heat and moisture fluxes^{21}. Our results indeed provide further direct empirical evidence from the LR04 stack that such a forcing remains dominant through the rest of the Pleistocene too.
In spite of the fact that the insolation forcing is not responsible for the MPT onset, it contributes significantly to postMPT most powerful glacialinterglacial cycles via the phaselocking mechanism through which the origins of the major deglaciations follow the maxima of the insolation gradient time series. To illustrate this, we considered an ensemble of major deglaciations with 1 per mil δ^{18}O decrease per 20 kyr or faster in the model behavior, collected from 10,000 model time series over the considered time interval from 2.6 Ma to present. For each event from this ensemble we calculated the time to the closest insolation gradient maximum as well as the time to the next deglaciation from the ensemble. A joint distribution density of these values shown in Fig. 2(C) indicates that the major deglaciations occur, on average, 10 kyr after the insolation gradient maxima and, consequently, the periods of glacialinterglacial cycles are close to multiples of the obliquity period 41 kyr. Moreover, most of the periods are distributed around the doubled and tripled obliquity period; this results in the near 100 kyr line in the mean model wavelet spectrum (Fig. 1(C)). The corresponding 10 major deglaciations from the LR04 stack are marked by circles in both δ^{18}O time series (Fig. 1(A)) and the density plot in Fig. 2(C): they fall well within the model statistics of deglaciation times.
Dynamical mechanism of the MPT
To identify a mechanism underlying the model’s behavior described above, let us first look at the steady states of the deterministic part of the model X_{n} = f(X_{n−1}, … X_{n−L}, t_{n}, q_{n}) at different constant values of the insolation forcing q. Figure 3(B) shows the time evolution of the steady states at different insolation gradient levels, representing a “skeleton” of the obtained model’s dynamics. While at the low insolation gradient values, corresponding to warmer climate, the model is stable, the stability of the steady state decreases with the insolation gradient, and eventually, it becomes unstable at high insolation gradient levels. Nevertheless, the deterministic model under the quasiperiodical insolation gradient always lives near the “warm” stable state: the relatively short period of the forcing prevents it from going far from it in spite of the epochs of the model’s instability. Instead, the model state just slightly moves along with the insolation gradient oscillations. This is why the model demonstrates the 41 kyr response plotted in Figs 1(E) and 2(A) during the whole Pleistocene in the absence of the stochastic part.
The behavior changes essentially if the millennial processes are switched on. Due to the trend in the model, the steady states corresponding to stable and unstable epochs of climate converge with time (as seen e.g. from Fig. 3(B)); simultaneously, the stability of the warm climate states decreases and the amplitude of the millennial noise increases (Fig. 3(A)). Eventually, starting from the middle Pleistocene, the noise is able to push out the model farther from the steady state to the area where the response of the model is essentially nonlinear: fast relaxations from the cold to warm climate and slower backward motions occur. The result is that in the full model with both millennial and insolation forcings we see the onset of the regime with noiseinduced sawtooth oscillations (Fig. 2(B)) associated with the MPT.
In general, the insolation forcing is not necessary for the MPT: the decreased stability of the warm steady state allows the noise to provide the transition starting from the average insolation gradient level (see e.g. Fig. 1(D)). However, the insolation gradient paces the glacialinterglacial cycles in the following way. The fastest stages of the glacialinterglacial oscillations (i.e., the rapid deglaciations) tend to occur when the climate is cold enough (high δ^{18}O level) and, at the same time, the insolation gradient has passed the maximal phase approaching the average. Under such conditions, the glacial climate is forced to go rapidly toward the warm steady state which appears in the model at the average insolation gradient level. Both these conditions determine the phase locking of the major deglaciations and the obliquity signal, which we observe in the model dynamics as well as in the LR04 time series (Fig. 2(C)). This mechanism is further explained in SI (the text and Figs S4–S6).
Model prediction
Finally, let us look what the obtained model predicts, being simply extrapolated into future, under the assumption that the detected trend stays the same. Note that this assumption may not be realistic, since there is no clear evidence of the ongoing longterm cooling even in the late Pleistocene^{1}. It is even more suspect today, when the anthropogenic warming contributes to shifting the climatic means. Still, if such a trend held steady in future, the large glaciations in the model would still occur, since warm and glacial states would continue to converge as Fig. S7 indicates. While the amplitude of “future” climate variability is almost unchanged in time, an additional shortscale (around 20 kyr period) cycle appears in the model time series independently of the insolation forcing (see WT for the model extrapolations with and without the insolation forcing in Fig. S7).
Discussion
The optimal Bayesian datadriven model derived here from the LR04 stack shows that strong nonlinear feedbacks in climate, gradual trends of global cooling, stochastic and insolation (obliquity) forcings are all important for various aspects of the MPT. From a dynamical point of view, the MPT is shown to be generated due to a longterm trend in climate leading to a noiseinduced nonlinear oscillation buildup. This trend makes the warm steady states less stable and hence allows the climate to reach colder states. As a result, the slow onset/rapid termination glacialinterglacial climate oscillations of large amplitude became more approachable in the late Pleistocene; this gives indication that the large glaciations started to be triggered at higher temperatures than in the preMPT epoch. This conclusion supports indirectly the hypothesis put forth in ref.^{17} about the leading role of a gradual deep ocean cooling in the MPT, through decreasing the ocean heat capacity and hence allowing sea ice to grow at higher atmospheric temperatures.
However, the strongly nonlinear relaxation oscillations responsible for the large sawtooth glacialinterglacial cycles cannot arise in our model without energetic millennial (and/or centennial) climate variability, represented by a stochastic process. So, the datadriven modeling of such shortscale variability (e.g. DansgaardOeschger, Heinrich, Bond oscillations), with connection to large glacialinterglacial cycle models, could be helpful for further clarification of the mechanisms underlying the Pleistocene dynamics.
Regarding the insolation forcing, we have found that only the insolation gradient, driven primary by the obliquity oscillations, is important for the Pleistocene dynamics. For the preMPT 41 kyr world the principal role of insolation gradient was explained in ref.^{21}, but the physical mechanism for its dominance in the late Pleistocene (shown by our model) is not presently clear. Accordingly to ref.^{22}, the glacial oscillation is phase locked with the Milankovitch forcing via the highlatitude insolation: lower insolation leads to larger glacier growth and makes the sea ice switches (rapid expansions of sea ice) followed by the deglaciation stages more probable than at higher insolation values. But the highlatitude insolation signal is dominated by the precession rather than obliquity oscillation, whereas our analysis uncovers that the impacts of both precession and eccentricity forcings are negligible. To reconcile these differences, we conjecture here that the tropical insolation can be also important in this mechanism. Above the average tropical insolation in combination with the low highlatitude insolation may result in an intensification of moisture transport to high latitudes and a related increase in the ice accumulation rate. The latter prolongs the stability of continental ice allowing temperatures to reach much lower values through the icealbedo feedback. Eventually, all behavior explained in refs^{4,17,22} still happens – the low temperatures, large sea ice extent, reducing precipitations followed by retreating of the glaciers, but the deglaciation stage starts from much colder conditions yielding higher amplitudes of the glacial cycles. Thus, the major deglaciations are tied to the positive phase of the lowhigh latitude insolation difference (i.e. the insolation gradient) oscillation, providing a phase locking of the climate cycles with the obliquity cycles. Probably, this is the direction the conceptual models could be modified in.
All the conclusions we made in this work are inferred by the analysis of the LR04 stack which only represents the global climate changes during the Pleistocene. Therefore the dynamical model derived from this data set can only describe the salient properties of the climate system and necessarily lacks potentially important regional details. In particular, it was shown^{3} that the Ocean Drilling Program records in the Southwest Pacific Ocean – an important region for studying the global ocean circulation – indicate a much more abrupt change in δ^{18}O from 950 ka to 870 ka compared to much smoother globallyaveraged changes. No doubt, differences in regional climate variability can be important for more detailed studies of glacial mechanisms, and future works on datadriven modeling of paleoclimate would be focused on the analysis of multivariate time series over the globe.
Another question beyond the scope of this work concerns the reversibility of the MPT: whether is it possible to bring the climate back to the 41kyr world by reversing the longterm trend in the system? In principle, the dynamics of the model obtained here is reversible in this sense: if we initialize the model by current conditions and launch the sequence {t_{n}} in Eq. 1 backward, we get the completely reversed scenario of the MPT, as shown in Fig. S8. This means that structurally, there is no hysteresis, or multistability, in the model on the time interval corresponding to the MPT. But the problem is that the LR04 time series does not provide sufficient information for identification of the physical processes contributing to the trend itself, whereas some of them may be irreversible, e.g. glacial erosion of regolith which has been suggested as a primary factor for large glaciations^{1}. Further understanding and modeling the causes of such longterm changes of the system may shed new light on this task.
Methods
In this section we first describe a specific form of the datadriven model we use. Then we briefly explain the representation of model structural parts, an algorithm for model learning and optimization, as well as a data preparation procedure. Details of the methods for representation, learning and optimization of the model in the form of Eq. 1 can be found in refs^{12,13,14,23,24}.
Model structure
The stochastic evolution operator (EO) we reconstruct from data (Eq. 1) explicitly depends on time as well as the forcing q. The dependence on time is needed for parameterizing possible deformations of the EO due to slow trends in climatic conditions. A response of the system to such trends can be essentially statedependent. Hence, the time variable t should be passed to the input of the model together with the state variables X. Obviously, the response of the system to forcing (insolation signal in the present case) can be statedependent as well, and the forcing q should be also involved as a dynamic variable. However, we should exclude here odd models which allow the astronomically driven forcing q to respond to the climatic trends or, vice versa, permit a direct impact of the stationary forcing on much longerscale trends. To this end, we split the deterministic part f of the model into two terms, each of them being responsible for either trend or insolation forcing impact:
Model representation
For the parameterization of a priori unknown functions f_{1}, f_{2} and g in Eqs 1 and 2 we use a simple artificial neural network (ANN) in the form of perceptron with one hidden layer and hyperbolic tangent activation function, which is known to be a universal approximator^{25}:
Here m is a number of neurons, (α, ω, γ) are the fitted coefficients (model’s parameters), and z is an input of some dimension d. Since we analyze the single LR04 time series, hereinafter we use the scalar variant of the function φ: R^{d} → R, so that α ∈ R, ω ∈ R^{d}, γ ∈ R. While the functions f_{1} and g take the vector of Takens variables^{26} z = (X_{n−1}, … X_{n−L}) as an input (i.e. d = L), the combined vector z = (X_{n−1}, … X_{n−L}, q_{n}) is passed to the input of the function f_{2}, so d = L + 2 for f_{2} due to twodimensional forcing q (see the main text).
The explicit dependence on time t of the deterministic model (Eq. 2) is put in f_{1} by means of a modified ANN structure (Eq. 3) as follows:
In fact, this is the firstorder (linear) expansion of a weak timedependence of the model caused by slow trends in the system. It was shown in refs^{27,28} that such an approximation is efficient for modeling and prediction of lowdimensional nonlinear dynamical system of general type with slowly changing parameters.
Model learning and optimization
Let us denote the model parameters entering in the functions f_{1}, f_{2} and g by \({\mu }_{{f}_{1}}\), \({\mu }_{{f}_{2}}\) and μ_{g} respectively (each of these μ consists of corresponding ANN coefficients). To determine the parameters, we use a costfunction in a form of the Bayesian posterior probability density function (PDF):
The first term in the righthand side is the likelihood function – the PDF of obtaining data X = (X_{1}, … X_{N}) of duration length N by the model with parameters \(({\mu }_{{f}_{1}},{\mu }_{{f}_{2}},{\mu }_{g})\) given the forcing time series q = (q_{1}, … q_{N}). This function can be inferred directly from Eqs 1 and 2 on the assumption that the random process ζ in Eq. 1 is Gaussian and uncorrelated:
where C is a constant depending on a starting fragment X_{1}, …X_{L} of the time series X (see refs^{23,24}). The second term in Eq. 5 is the prior PDF of the model parameters. This function restricts the domain of model learning in the space of parameters, thus compensating the degeneration of the ANN’s parameter space and simplifying the numerical analysis. Following refs^{12,13,14,23,24,27}, we use a Gaussian prior PDF with different variances for different groups of ANN coefficients (α, ω, and γ).
A crucial point in datadriven modeling is selecting the model structure of optimal complexity in the sense of its correspondence to the data. The Bayesian way for optimal model selection is finding the model that maximizes the marginal likelihood function characterizing the probability to produce the data X by the model:
Here μ are the internal parameters of a particular model H_{i}, P(μH_{i}) is a prior probability density for them, and P(Xμ, H_{i}) is the likelihood for the i_{th} model’s parameters. The predefined set of models {H_{i}} should be wide enough to incorporate as much as possible physically relevant evolution operators of the system. Using condition Eq. 7 as a criterion for best model selection prevents us from obtaining overfitted models that fit well the particular observations (and hence yield large values of the likelihood in Eq. 7) but are useless for inferring robust dynamical laws underlying data.
The structural parameters determining our model complexity are the time lag L defining the model memory and the numbers of neurons \({m}_{{f}_{1}}\), \({m}_{{f}_{2}}\) and m_{g} in the ANNs representing both the deterministic (f_{1} and f_{2}) and stochastic (g) parts of the model. Thus an ensemble of models used for the best model selection consists of the models with different structural parameter sets \((L,{m}_{{f}_{1}},{m}_{{f}_{2}},{m}_{g})\). The procedure for the estimation of the integral in Eq. 7 we use here is the same as in refs^{23,24} for every model H_{i} from an ensemble of \((L,{m}_{{f}_{1}},{m}_{{f}_{2}},{m}_{g})\) the following function derived by the Laplace integration method is calculated:
Here μ_{0} is model parameters minimizing the function Ψ(μ), i.e. the costfunction Eq. 5, ∇∇^{T}Ψ is the matrix of the second derivatives (Hessian matrix) of the function Ψ with respect to the parameters μ at the point of its minimum μ_{0}. While the value of the first term in the upper row of Eq. 8 indicates how well the model outputs fit to the data, the second term penalizes the model for its complexity. Actually, the minimization of ln P(XH_{i}) provides a balance between the fit accuracy and the model complexity.
Eventually, to obtain the best model, we have to learn each model from the ensemble, which gives us the values of μ_{0}, calculate the optimality (Eq. 8), and select the model that minimizes the optimality.
Data preparation
The original LR04 stack is sampled with nonconstant time step ranging from 1 kyr to 2.5 kyr for the time period [2.6 Ma, 0 Ma]. Each point of these series represents the average value of δ^{18}O on the corresponding time interval. Thus, we can consider the stack as a set of such time intervals. In order to construct the model described above, we resampled the LR04 stack with the constant time step 2.5 kyr and applied a smoothing window with the size w = 5 kyr simultaneously by the following procedure. For each time instance t of the new time series we took all time intervals of the original LR04 stack which intersect with or lie in [t − w/2, t + w/2]; the values from these intervals were averaged with the weights proportional to the sizes of the corresponding intersections and the resulting average value was taken as the new value at the time t.
The insolation forcing time series was resampled from the time step 1 kyr to the time step 2.5 kyr using the classic cubic spline interpolation.
Data Availability
The analyzed LR04 stack is open to the public.
References
 1.
Clark, P. U. et al. The middle Pleistocene transition: characteristics, mechanisms, and implications for longterm changes in atmospheric pCO_{2}. Quaternary Science Reviews 25, 3150–3184, https://www.sciencedirect.com/science/article/pii/S0277379106002332 (2006).
 2.
Maslin, M. A. & Brierley, C. M. The role of orbital forcing in the Early Middle Pleistocene Transition. Quaternary International 389, 47–55, https://www.sciencedirect.com/science/article/pii/S1040618215000701?via{%}3Dihub (2015).
 3.
Elderfield, H. et al. Evolution of ocean temperature and ice volume through the midPleistocene climate transition. Science (New York, N.Y.) 337, 704–9, http://www.ncbi.nlm.nih.gov/pubmed/22879512 (2012).
 4.
Gildor, H. & Tziperman, E. A sea ice climate switch mechanism for the 100kyr glacial cycles. Journal of Geophysical Research 106, 9117, https://doi.org/10.1029/1999JC000120 (2001).
 5.
Crucifix, M. Oscillators and relaxation phenomena in Pleistocene climate theory. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 370, 1140–65, http://www.ncbi.nlm.nih.gov/pubmed/22291227, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3261435 (2012).
 6.
Rial, J. A., Oh, J. & Reischmann, E. Synchronization of the climate system to eccentricity forcing and the 100,000year problem. Nature Geoscience 6, 289–293, http://www.nature.com/articles/ngeo1756 (2013).
 7.
Ditlevsen, P. D. Bifurcation structure and noiseassisted transitions in the Pleistocene glacial cycles. Paleoceanography 24, https://doi.org/10.1029/2008PA001673 (2009).
 8.
Huybers, P. Pleistocene glacial variability as a chaotic response to obliquity forcing. Climate of the Past 5, 481–488, http://www.climpast.net/5/481/2009/ (2009).
 9.
Benzi, R., Parisi, G., Sutera, A. & Vulpiani, A. Stochastic resonance in climatic change. Tellus 34, 10–16, http://tellusa.net/index.php/tellusa/article/view/10782 (1982).
 10.
Tziperman, E., Raymo, M. E., Huybers, P. & Wunsch, C. Consequences of pacing the Pleistocene 100 kyr ice ages by nonlinear phase locking to Milankovitch forcing. Paleoceanography 21, https://doi.org/10.1029/2005PA001241 (2006).
 11.
Lisiecki, L. E. & Raymo, M. E. A PliocenePleistocene stack of 57 globally distributed benthic δ ^{18}O records. Paleoceanography 20, n/a–n/a, https://doi.org/10.1029/2004PA001071 (2005).
 12.
Molkov, Y. I., Loskutov, E. M., Mukhin, D. N. & Feigin, A. M. Random dynamical models from time series. Physical Review E 85, 036216, https://doi.org/10.1103/PhysRevE.85.036216 (2012).
 13.
Mukhin, D. et al. Predicting Critical Transitions in ENSO Models. Part I: Methodology and Simple Models with Memory. Journal of Climate 28, 1940–1961, https://doi.org/10.1175/JCLID1400239.1 (2015).
 14.
Mukhin, D. et al. Predicting Critical Transitions in ENSO models. Part II: Spatially Dependent Models. Journal of Climate 28, 1962–1976, https://doi.org/10.1175/JCLID1400240.1 (2015).
 15.
Berger, A., Li, X. & Loutre, M. Modelling northern hemisphere ice volume over the last 3Ma. Quaternary Science Reviews 18, 1–11, https://www.sciencedirect.com/science/article/pii/S027737919800033X (1999).
 16.
Rial, J. Abrupt climate change: chaos and order at orbital and millennial scales. Global and Planetary Change 41, 95–109, https://www.sciencedirect.com/science/article/pii/S0921818103001875 (2004).
 17.
Tziperman, E. & Gildor, H. On the midPleistocene transition to 100kyr glacial cycles and the asymmetry between glaciation and deglaciation times. Paleoceanography 18, 1–1–1–8, https://doi.org/10.1029/2001pa000627 (2003).
 18.
Berger, A. & Loutre, M. Insolation values for the climate of the last 10 million years. Quaternary Science Reviews 10, 297–317, https://www.sciencedirect.com/science/article/pii/027737919190033Q (1991).
 19.
McManus, J. F., Oppo, D. W. & Cullen, J. L. A 0.5MillionYear Record of MillennialScale Climate Variability in the North Atlantic. Science 283, 971–975, https://doi.org/10.1126/science.283.5404.971 (1999).
 20.
Schulz, M., Berger, W. H., Sarnthein, M. & Grootes, P. M. Amplitude variations of 1470year climate oscillations during the last 100,000 years linked to fluctuations of continental ice mass. Geophysical Research Letters 26, 3385–3388, https://doi.org/10.1029/1999GL006069 (1999).
 21.
Raymo, M. E. & Nisancioglu, K. H. The 41 kyr world: Milankovitch’s other unsolved mystery. Paleoceanography 18, n/a–n/a, https://doi.org/10.1029/2002PA000791 (2003).
 22.
Gildor, H. & Tziperman, E. Sea ice as the glacial cycles’ Climate switch: role of seasonal and orbital forcing. Paleoceanography 15, 605–615, https://doi.org/10.1029/1999PA000461. (2000).
 23.
Gavrilov, A., Loskutov, E. & Mukhin, D. Bayesian optimization of empirical model with statedependent stochastic forcing. Chaos, Solitons and Fractals 104, 327–337, http://www.sciencedirect.com/science/article/pii/S0960077917303648 (2017).
 24.
Gavrilov, A. et al. Linear dynamical modes as new variables for datadriven ENSO forecast. Climate Dynamics, 1–18, https://doi.org/10.1007/s0038201842557 (2018).
 25.
Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989).
 26.
Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980, 366–381 (Springer Berlin Heidelberg), https://doi.org/10.1007/BFb0091924.
 27.
Molkov, Y. I., Mukhin, D. N., Loskutov, E. M., Timushev, R. I. & Feigin, A. M. Prognosis of qualitative system behavior by noisy, nonstationary, chaotic time series. Physical Review E 84, 036215, https://doi.org/10.1103/PhysRevE.84.036215. (2011).
 28.
Loskutov, E. M., Molkov, Y. I., Mukhin, D. N. & Feigin, A. M. Markov chain Monte Carlo method in Bayesian reconstruction of dynamical systems from noisy chaotic time series. Physical Review E 77, 066214, https://doi.org/10.1103/PhysRevE.77.066214. (2008).
Acknowledgements
The paper was supported by the Government of Russian Federation (Agreement No. 14.Z50.31.0033 with the Institute of Applied Physics of RAS). The implementation of the forced datadriven model was supported by the Russian Science Foundation (Grant No. 181200231).
Author information
Affiliations
Contributions
All authors are coPIs of the project, they contributed to its design and numerical algorithms, and analyzed the results. D.M. wrote the manuscript and SI. A.G. performed the numerical calculations and plotted the figures. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mukhin, D., Gavrilov, A., Loskutov, E. et al. Bayesian Data Analysis for Revealing Causes of the Middle Pleistocene Transition. Sci Rep 9, 7328 (2019). https://doi.org/10.1038/s41598019438673
Received:
Accepted:
Published:
Further reading

Bayesian framework for simulation of dynamical systems from multidimensional data using recurrent neural network
Chaos: An Interdisciplinary Journal of Nonlinear Science (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.