A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models

Paleoclimate reconstructions are now integral to climate assessments, yet the consequences of using different methodologies and proxy data require rigorous benchmarking. Pseudoproxy experiments (PPEs) provide a tractable and transparent test bed for evaluating climate reconstruction methods and their sensitivity to aspects of real-world proxy networks. Here we develop a dataset that leverages proxy system models (PSMs) for this purpose, which emulates the essential physical, chemical, biological, and geological processes that translate climate signals into proxy records, making these synthetic proxies more relevant to the real world. We apply a suite of PSMs to emulate the widely-used PAGES 2k dataset, including realistic spatiotemporal sampling and error structure. A hierarchical approach allows us to produce many variants of this base dataset, isolating the impact of sampling bias in time and space, representation error, sampling error, and other assumptions. Combining these various experiments produces a rich dataset (“pseudoPAGES2k”) for many applications. As an illustration, we show how to conduct a PPE with this dataset based on emerging climate field reconstruction techniques.

While pre-instrumental intercomparisons of reconstruction methods have occasionally been carried out with real-world proxy observations 22,23 , such efforts are fundamentally limited by the lack of a true benchmark: pre-instrumental climates were not, by definition, observed directly, so these intercomparisons can only inform on convergence or divergence, but cannot provide any metric of their closeness to the true climate.
To sidestep this hurdle, pseudoproxy experiments (PPEs) have long been used as a laboratory to benchmark climate reconstruction methods.The heart of PPEs is to start from the output of long integrations of a global climate model and to apply mathematical transformations to this output to mimic the processes whereby paleoclimate proxies register these climate variations in space and time 24 .Because the original climate is specified, and sampled perfectly in space and time, the ability of a reconstruction to recover this climate is known.Moreover, as the generating process of these "pseudoproxies" is specified, it can be manipulated to yield insights into the sources of uncertainty contributing to reconstruction error.While simple PPE designs are informative, the more realistic the target climate and pseudoproxy generation process, the more relevant this benchmark becomes, so there is considerable potential in this avenue of research 16,[25][26][27][28][29][30][31] .
Initial work used the simplistic assumption that paleotemperature proxies were a linear superposition of local temperature and Gaussian (white) noise, sampled uniformly in time [32][33][34] .Over time, more realistic pseudoproxy constructions were developed, involving other climate variables, more elaborate noise models, realistic spatiotemporal sampling, and noise levels approximating real proxy networks 16,28,35,36 .Recent work 29 has leveraged more realistic proxy system modeling (PSM) frameworks [37][38][39][40][41][42] to capture the essential physical, chemical, biological and geological processes that translate climate signals into the paleoclimate records that form the basis of climate reconstruction efforts (e.g.ref. 43 ).However, such models have yet to gain widespread use, so even recent efforts have sometimes employed a simplistic "temperature + noise" pseudoproxy design 16,29,44 .
The PAGES 2k Phase 2 global multi-proxy database (Fig. 1) has been widely used for studies of Common Era climate since its release 43 .It has played a central role for investigating the multi-decadal and longer-term surface temperature variability 45,46 and the spatiotemporal temperature patterns of various climatic epochs 23,47 over the Common Era.In addition, it has served as the principal data source for the latest version 2.1 of the Last Millennium Reanalysis (LMR) products 48 , and has become a common network template for pseudoproxy studies 31,49 .However, these PPE related studies used only a partial network and employed a simplistic "temperature + noise" design, and a systematic pseudoproxy emulation of the PAGES 2k network has yet to be produced.The PAGES 2k Phase 2 has a number of known biases that present challenge to global annual mean temperature reconstruction 50 , which need to be rigorously evaluated.
Here, we do so by generating a pseudoproxy dataset that: (i) emulates the majority of the PAGES 2k Phase 2 network 43 , (ii) employs a more realistic data-generating mechanism with proxy system models (PSMs) and isotope-enabled climate model simulations, and (iii) explicitly separates sensor, archive, and observational effects.By combining various pseudoproxy designs, noise levels, and spatiotemporal sampling scenarios, we generate many digital avatars of the PAGES 2k network, supporting the evaluation of climate reconstruction methods in a wide variety of settings.To illustrate the use of this dataset, we show its application to a suite of climate field reconstructions 10,48,51 .

Methods
reference climate.Our base climate utilizes the "iCESM1" last millennium simulation (iCESM-LM hereafter) generated by the isotope-enabled Community Earth System Model (iCESM) 52 .As an addition to the standard CESM, iCESM simulates the isotopic water fluxes transported between its five major isotope-enabled components, including the atmosphere model iCAM, the land model iCLM, the ocean model iPOP, the sea ice model iCICE, and the river runoff model iRTM.The atmosphere model iCAM tracks water tracers and isotopes in all phases through processes such as surface fluxes, boundary layer mixing, cloud physics, convection, and advection, and simulates precipitation δ 18 O variability with high fidelity 53 .The land model iCLM considers the water vapor flux and isotope fractionation in vegetated land surfaces 54 .Main processes include water isotope exchanges among soil, spaces under and above canopy, and leaves.The land and vegetation types and amount of canopy use a modern climatological mean with a constant seasonal cycle 55 .The ocean model iPOP transports water isotopes passively through resolved flow and parameterized turbulence, and the simulated seawater δ 18 O is validated under present-day climate conditions 56 .The sea ice model iCICE simulates the sinks of the isotopic water mass through melting and sublimation processes, and the sources through snowfall, sea ice growth, and vapor condensation 52 .All components coupled together provide a plausible simulation of the water isotope fields.
The iCESM-LM simulation applies the transient external forcings following the same setup for the CESM Last Millennium Ensemble (CESM-LME) 57 .The solar forcing comes from the total solar irradiance reconstruction by Vieira et al. 58 patched with spectral variations from Schmidt et al. 59 .The last millennium volcanic forcing is based on the ice core-based index by Gao et al. 60 , while for the historical period, an eruption dataset by Ammann et al. 61 is adopted.The greenhouse gas forcing, namely the concentrations of the main long-lasting greenhouse gases (i.e., CO2, CH4, N2O), are derived from Antarctic ice core analyses by Schmidt et al. 59 .For the land use and land cover boundary conditions, the reconstruction by Pongratz et al. 62 and that by Hurtt et al. 63 are merged together to yield a consistent land use change.The orbital forcing is computed in the model based on Berger et al. 64 .The ozone forcing comes from the Whole Atmosphere Community Climate Model (WACCM) Fig. 1 The PAGES 2k Phase 2 network 43 .and the prescribed aerosol forcing are applied only over the historical period.For more details, please refer to CESM-LME 57 .
Proxy network.Figure 1 shows the PAGES 2k Phase 2 Network 43 .It consists of 692 records from 648 globally distributed sites, archived in trees, corals and sclerosponges, marine sediments, lake sediments, glacier ice, documentary sources, speleothems, boreholes, bivalves, and hybrid records.Each archive includes single or multiple observation types, among which tree ring width (TRW), maximum latewood density (MXD), coral and sclerosponge δ 18 O and Sr/Ca, lake varve thickness, and ice core δ 18 O are essential to Common Era temperature reconstructions (e.g., LMR) and their PSMs have been developed by recent efforts already 38 .We therefore focus on these proxy types, and generate their emulations to form our pseudoproxy network (Fig. 2).For proxy sites located within the same model grid cell, the input climate signals are the same, while the generation mechanisms vary according to their proxy types.
Proxy system modeling.Following the proxy system modeling framework 37 , we build our pseudoproxy network based on the iCESM output, leveraging the PSMs from the PRYSM toolbox 38 and the CFR codebase 65 .The concept of PSMs encompasses both geophysical/chemical/biological process-based models, as well as Fig. 2 The spatiotemporal availability of the PAGES 2k pseudoproxy network with realistic and full temporal availability.
statistical models; both can be either linear or nonlinear.In this study, both categories of PSMs have been adopted, depending on availability.A given PSM can only be applied if its inputs are within the scope of the available climate variables.In addition, the more complex the PSM, the more parameters it contains, and these parameters must generally be fitted to modern observations, lest they introduce additional sources of uncertainty.
As in all modeling endeavors, the choice of PSM is therefore a trade-off between "sins of omission" (excessive simplicity) and "sins of commission" (excessive complexity).The present dataset used the most complex PSMs where justified by scientific understanding and available data.When these conditions were not met, simpler PSMs were selected to avoid sins of commissions or logistical hurdles (e.g.model fields available at too coarse a resolution).
Statistical PSMs, although highly idealized, are still based on scientific understanding of the geophysical/ chemical/biological processes leading to the transduction of climate signals into proxy archives.As shown in Tardif et al. 48, even linear, statistical PSMs for tree-ring width that include bivariate and seasonal dependence can yield vastly more realistic results than the traditional fitting to annual temperature.
Forward modeling of tree ring width (TRW).Tree-ring width (TRW) is a major observation source to investigate the Common Era climate.In the PAGES 2k database, TRW represents the largest network with 354 records, most of which are located in the Northern Hemisphere.Depending on the location and species, TRW chronologies may record not only temperature variations but also moisture conditions, although the climatic signals can be modulated by biological memory effects 49,50,[66][67][68][69][70][71][72] .The relationship between TRW and the environmental variables is thus complex, and TRW PSMs with various complexity levels have been developed since 2000, including TREERING2000 73 , Vaganov-Shashkin (VS) 74 and its simplified version VS-Lite [75][76][77] , MAIDEN (Modeling and Analysis In DENdroecology) 78,79 , and even the land model ORCHIDEE (ORganizing Carbon and Hydrology In Dynamic EcosystEms) 80 .
This work used VS-Lite developed by Tolwinski-Ward et al. [75][76][77] to generate our pseudo-TRW network because of its overall skill 79,81 , simplicity, and capacity to be widely applied to the PAGES 2k sites.VS-Lite takes monthly temperature and precipitation signals as input, and emulates a threshold-dependent tree-ring monthly growth response to the climate with piece-wise linear growth response functions (Eq.( 1)) determined by four parameters: the lower and upper thresholds for temperature and soil moisture, respectively: where V represents T (temperature) or M (moisture).The overall growth response is then the minimum of these two response functions modulated by the insolation-based growth response (g E ) (Eq. ( 2)), which is determined by the latitude of the site, and the final output TRW is the standardized series of the annual integration of the monthly growths, with an error term added on (Eq.( 3)): where t represents time in month, t s and t e denote the window for the integration, and ζ is a pink noise term (i.e. a stochastic process with a spectral density S f f ( ) ∝ β − , with β a positive constant).Setting t s < 0 (a specific month in the previous year) can help mimic the biological memory effect or other unaccounted for sources of low frequency variability in TRW 66,69,72,[82][83][84][85] .Following ref. 75 , we set t s = −4 and t s = 12, which represents an integration window from September of the previous year to December of the current year for the Northern Hemisphere, and from March of the current year to June of the next year for the Southern Hemisphere.The pink noise term is added to further mimic other non-climatic processes such as the detrending process of TRW records, following the formulation of colored noise proposed in ref. 86 with tuned spectral scaling slope 87,88 β = 2 and SNR = 1 (signal-to-noise ratio defined in standard deviation 24,26,89 ).Without this term, the scaling slope of the simulated TRWs will be significantly flatter than that observed in the real records, especially for the low-frequency band 40 .The need for this can be viewed in two ways: on one hand, it suggests that tree-ring width records in the PAGES2k database contain more low-frequency than expected from the climate signal and simple persistence structure present in VS-Lite alone, perhaps due to data processing (detrending and standardization) or unaccounted for biological or ecological processes.Alternatively, this can be seen as a result of a "sin of omission" in VS-Lite and an incomplete mimic of the full range of biological processes important for the autocorrelation structure of temperature-sensitive tree-ring series.
The four threshold parameters T 1 , T 2 , M 1 , M 2 are crucial to the behavior of the model.We calibrate them against the CRUTS monthly temperature and precipitation observations 90 version 4.05, using the Bayesian inference method elaborated in ref. 76 .This essentially generates optimal posterior probability distributions for each threshold parameter by updating the prior distributions over Monte-Carlo iterations, and yields the estimate of each parameter following maximum likelihood estimation (MLE).With the calibrated parameters, iCESM-simulated monthly temperature and precipitation signals can be translated to the corresponding pseudo-TRW series.An example of the generated pseudo-TRW chronology and its comparison to the real-world counterpart in time and frequency domains is shown in Fig. 3.

Forward modeling of maximum latewood density (MXD).
Compared to TRW, maximum latewood density (MXD) more faithfully tracks growing season temperature history without distortions due to biological memory effects 49,50,68,69,84,[91][92][93][94][95] .As there is not yet a published, tractable proxy system model for MXD, here we use a simple univariate linear model to emulate the behavior of MXD series: seasonal where a represents a linear slope factor, T seasonal the average temperature over the growing season, and b the intercept.The growing season is calibrated against the CRUTS dataset, version 4.05.Following ref. 48, the season that yields the optimal regression skill is picked from an expert-curated pool of growing season candidates, including the default calendar year option (Jan-Dec) and variants of warm seasons (i.e., Jun-Aug, Mar-Aug, Jun-Nov for Northern Hemisphere, and Dec-Feb, Dec-May, Sep-Feb for Southern Hemisphere) during which trees are expected to grow.An example of a generated pseudo-MXD chronology is shown in Fig. 4.
Forward modeling of coral and sclerosponge δ 18 O.In contrast to trees, corals and sclerosponges mainly cover the tropical ocean regions and are thus of great importance to investigating tropical climate variability, including El Niño Southern Oscillation (ENSO) 17,[96][97][98][99][100][101] .Following Brown et al. 102 , we use a bilinear model to simulate coral and sclerosponge δ 18 O based on the annual sea surface temperature (SST) and seawater δ 18 O (denoted as δ 18 O sw ) signals:  103 state that since the δ 18 O sw network is scarce, they use sea surface salinity (SSS) to estimate δ 18 O sw .However, a salinity-based PSM is reliant on the SSS/δ 18 O sw relationships that are known to be nonstationary and are based on extremely limited observational data 104 ; the original formulation based on δ 18 O sw is thus preferable given that the iCESM output is leveraged.An example of the generated pseudo-coral/ sclerosponge δ 18 O chronology is shown in Fig. 5.
Forward modeling of coral and sclerosponge Sr/Ca.The skeletal trace element ratio Sr/Ca in corals and sclerosponges has a straightforward temperature interpretation.Following Corrège et al. 105 , we apply a simple univariate linear model based on the annual SST signal, but with fixed parameters: Fig. 3 The dashboard for the tree ring width record "NAm_136" in dataset "ppwn_SNRinf_rta".The unit "NA" stands for "not applicable" as the variable is a standardized index and thus unitless."PSD" refers to power spectral density and is in the unit of power (squared unit of the proxy variable) per year.
Fig. 4 The dashboard for the maximum latewood density record "NAm_134" in dataset "ppwn_SNRinf_rta".The unit "NA" stands for "not applicable" as the variable is a standardized index and thus unitless."PSD" refers to power spectral density and is in the unit of power (squared unit of the proxy variable) per year.
Fig. 5 The dashboard for the coral δ 18 O record "Ocn_075" in dataset "ppwn_SNRinf_rta"."PSD" refers to power spectral density and is in the unit of power (squared unit of the proxy variable) per year.
where a represents the linear slope factor with a Gaussian distribution with mean of −0.06 and standard deviation of 0.01, and b is the intercept with a mean value around 10.553 based on Table 1 in Corrège et al. 105 .In this study, we take a = −0.06 and b = 10.553.An example of the generated pseudo-coral/sclerosponge Sr/Ca chronology is shown in Fig. 6.
Forward modeling of ice core δ 18 O.Glacier ice cores mainly cover the polar and mountain regions, where trees cannot grow.They are usually well-preserved and span a long time interval with annual time resolution, and are important to investigate long-term climate change.For ice core δ 18 O, we apply the corresponding module in the PRYSM toolbox 38 , which is based on the work of Johnsen where p represents the monthly precipitation amount, and δ 18 O p the precipitation δ 18 O.The precipitationweighted δ 18 O is then corrected based on the elevation difference between the proxy site and its nearest model Table 1.The seasonality of each lake varve thickness site.Fig. 6 The dashboard for the coral Sr/Ca record "Ocn_067" in dataset "ppwn_SNRinf_rta"."PSD" refers to power spectral density and is in the unit of power (squared unit of the proxy variable) per year.grid point with a rate of −0.25 per 100 meters.Next, its archive model emulates the compaction and diffusion processes of isotopes in ice via a convolution with a Gaussian kernel G: 18 ice 18 weighted An example of the generated pseudo-ice core δ 18 O chronology is shown in Fig. 7.
Forward modeling of lake varve thickness.Varves, or annually laminated sediments, can be valuable temperature proxies for the Common Era due to their high-resolution and because they can be found in areas where other annually-resolved archives are absent.Varve thickness or mass accumulation rate are directly related to sediment input and deposition, which in turn can be strongly related to climate in some lakes, however many phenomena can affect varve thickness, and the relationship between climatic and environmental drivers and varve thickness is often complex and typically varies from lake to lake 111 .Temperature-driven varve thickness records are most common in the Arctic, where summer temperature can have strong and direct impacts on sediment transportation by melting winter snowpack and glaciers and extending the ice-free season.
The PAGES 2k Phase 2 database includes eight sites with varve thickness records interpreted to respond to temperature.Mechanistically simulating varve thickness is complex, highly site-specific, and not practical for most PPE studies.Nevertheless, most varve thickness records share characteristics that are readily and simply simulated.There are two key processes that we simulate.First, because varve thickness measures a depositional process, the distribution of varve thickness is zero-bounded and right-skewed, and is appropriately modeled with a Poisson or Gamma distribution 112 .Second, varve thickness records typically include substantial year-to-year memory.Unlike most sedimentary records, this is not due to bioturbation or post-depositional mixing (as this would destroy the laminations).However, glacial and sedimentary processes in the watershed and in the lake can be prone to significant memory, which affects the spectral characteristics of varve thickness records.
Here, we apply a simple model as below: seasonal where Γ(•) represents a mapping from the original distribution to a Gamma distribution, T seasonal is the seasonally-averaged temperature calculated based on the seasonality metadata of each site (Table 1) 43 , and b is a realization of fractional Brownian motion with Hurst index H = 0.75 and SNR = 1, a combination we find fits well with the real records.An example of the generated pseudo-lake varve thickness chronology is shown in Fig. 8.

Fig. 7
The dashboard for the ice core δ 18 O record "Arc_029" in dataset "ppwn_SNRinf_rta"."PSD" refers to power spectral density and is in the unit of power (squared unit of the proxy variable) per year.
Pseudoproxy production workflow.Figure 9 shows the general procedure for pseudoproxy generation.
The starting point is the isotope-enabled Community Earth System Model (iCESM) last millennium plus historical simulation 52 (Section Reference Climate), chosen so that the isotope-related proxies can be simulated with minimal assumptions.Environmental variables are taken from the iCESM output, including air surface temperature, precipitation rate, sea level pressure, precipitation δ 18 O, seawater δ 18 O, and sea surface temperature (SST).Proxy metadata are Fig. 8 The dashboard for the lake varve thickness record "Arc_025" in dataset "ppwn_SNRinf_rta"."PSD" refers to power spectral density and is in the unit of power (squared unit of the proxy variable) per year.taken from the PAGES 2k dataset (Section Proxy Network), including the location information, time axis, archive type, sensor, species, seasonality, etc.
These two sources of information (environmental variables and proxy metadata) are then fed to the PSMs for tree-ring width (TRW), maximum latewood density (MXD), coral/sclerosponge δ 18 O, coral/sclerosponge Sr/Ca, lake varve thickness, and ice core δ 18 O, which translate the climatic signals and generate the raw output in proxy space (Section Proxy Modeling).
The raw output is then treated as signal, and white noise is added with a set of signal-to-noise ratio (SNR, defined in standard deviation 24,26,89 ) options (∞, 10, 2, 1, 0.5, 0.25) 22,28 , where SNR = ∞ is a noise-free case, SNR = 1 means that the signal and noise share an equal standard deviation, etc.We generate datasets with two types of temporal sampling: (1) full annual sampling over 850-2005 CE, and (2) the realistic temporal availability of each record (Fig. 2).
Because iCESM is a biased representation of reality, the pseudoproxies generated by this workflow inherit some of the same biases in low-order statistics like mean and variance.To facilitate comparison with real-world records, we apply a bias correction and variance matching against the real records, according to the mean and variance of the real proxy measurements over the common timespan to the pseudoproxy counterpart.Note that this step shifts and scales the time series, but has no impact on the statistical distribution (e.g., Gaussian or Gamma), nor the spectral characteristics (i.e., scaling slopes and peaks) of the simulated proxies.
As a benchmark, we also generate pseudoproxies following the traditional temperature-plus-noise method: the temperature signal at the grid cell nearest each proxy site is added with white noise with the same set of SNR options and the same two types of temporal sampling.
This workflow results in multiple pseudoproxy emulations of the PAGES 2k network, differing in: design either "temperature-plus-white-noise" model (tpwn) or using the pseudoproxy models described above, with added white noise (ppwn) noise level as quantified by the SNR of ∞ (pure signal), 10, 2, 1, 0.5 or 0.25.temporal sampling either full annual sampling over 850-2005 CE (fta), or the realistic temporal availability of Fig. 2 (rta).

Data records
Table 2 lists the pseudoproxy datasets generated in this study, which we call "pseudoPAGES2k".The dataset IDs indicate the property of each dataset.For instance, "tpwn_SNR10_fta" means that the pseudoproxies are generated with the temperature-plus-white-noise method with SNR equals to 10 and full temporal availability, while "ppwn_SNR0.5_rta"means that the pseudoproxies are generated via the PSM hierarchy with white noise added on and SNR equals to 0.5, and the realistic temporal availability is applied, and so on and so forth.The datasets are archived at Zenodo 113  A list of the "pseudoPAGES2k" pseudoproxy datasets."SNR" refers to signal-to-noise ratio defined in standard deviation 24,26,89 .The full sampling refers to 850-2005 CE with annual resolution, and the realistic sampling refers to the realistic temporal availability of each record.
The "iCESM1" last millennium simulation (iCESM-LM) used in this study can be accessed at a data server hosted by Rorbert Tardif at University of Washington (https://atmos.washington.edu/~rtardif/LMR/prior).
The PAGES 2k Phase 2 Network used in this study can be accessed at the National Center for Environmental Information's World Data Service for Paleoclimatology (https://www.ncei.noaa.gov/access/paleo-search/study/21171).

technical Validation
To verify if the generation procedure (Fig. 9) yields a realistic pseudoproxy emulation of the PAGES 2k database, we validate the generated pseudoproxies against the original records' statistics, in both time and frequency domains.We emphasize that this is a validation specifically of the realism (and therefore utility) of the pseudoproxy generation procedure, rather than an evaluation of any single PSM or GCM, which has been done elsewhere.Rather, we aim to show that, coupling these models-imperfect though they may be-can produce pseudoproxies that emulate key characteristics of the target series.In the time domain, a good pseudoproxy emulation should reproduce the probability distribution shape of the real proxies; this may be assessed via split violin plots.In the frequency domain, a good emulation should reproduce the power spectral density (PSD) of the target series, indicating the energy partitioning per frequency interval, particularly the periodic and continuum 114 characteristics of the series.
Figures 3 to 8 show examples for specific records, one site per proxy type.Since the real records may be unevenly-spaced in time, we leverage the Weighted Wavelet Z-transform (WWZ) method implemented in Pyleoclim 115 , to obtain the PSD curves.As illustrated by the PSD plots and the probability distribution plots, the pseudoproxies show an overall good agreement with the real records, including, for instance, the steep attenuation of high-frequencies in the ice core δ 18 O record shown in Fig. 7, and the long tail distribution of the varve thickness record shown in Fig. 8.To validate thoroughly the spectral characteristics, Fig. 10 shows the spectral analysis by proxy types.It can be seen that overall good agreement is achieved between the pseudoproxies and their real counterparts, indicating a realistic emulation from the spectral perspective.This should result in more realistic assessments of the spectral characteristics of reconstruction skill.We emphasize that the procedure of bias correction and variance matching has no bearing on these aspects of the validation, as it simply adds a scale and offset to the pseudoproxies, without modifying their probability distribution shape or spectral characteristics.Fig. 10 Spectral analysis of the pseudoproxy records in dataset "ppwn_SNRinf_rta" by proxy type.The gray curves denote the power spectral density (PSD, in the unit of power per year, i.e., squared unit of the proxy variable per year) of the real records, while the colored curves denote that of the pseudoproxy records.

Usage Notes
To illustrate the many potential of this dataset, we provide Jupyter notebooks (Code Availability) for the basic analysis and visualization of the dataset, as well as applications to climate field reconstruction.Specifically, we provide Python-based examples of: • Loading and visualizing the pseudoPAGES2k dataset.
• Filtering the pseudoPAGES2k dataset according to various criteria.
• How to apply Paleoclimate Data Assimilation (in the vein of the Last Millennium Reanalysis 48 ) to the pseudo-PAGES2k dataset, and its use for benchmarking climate field reconstruction methods.
Other potential uses of this dataset and its production workflow include optimal sampling design 116 .A natural extension would be to add age uncertainties to these pseudoproxies, as done in ref. 31 .
a = −0.22represents the linear slope factor, and b = 0.97002 the conversion factor from VSMOW to VPDB.Thompson et al.

Fig. 9
Fig. 9 Flow chart of the general procedure for pseudoproxy generation.