ModE-RA: a global monthly paleo-reanalysis of the modern era 1421 to 2008

Valler, Veronika; Franke, Jörg; Brugnara, Yuri; Samakinwa, Eric; Hand, Ralf; Lundstad, Elin; Burgdorf, Angela-Maria; Lipfert, Laura; Friedman, Andrew Ronald; Brönnimann, Stefan

doi:10.1038/s41597-023-02733-8

Download PDF

Data Descriptor
Open access
Published: 05 January 2024

ModE-RA: a global monthly paleo-reanalysis of the modern era 1421 to 2008

Scientific Data volume 11, Article number: 36 (2024) Cite this article

2383 Accesses
6 Citations
Metrics details

Subjects

Abstract

The Modern Era Reanalysis (ModE-RA) is a global monthly paleo-reanalysis covering the period between 1421 and 2008. To reconstruct past climate fields an offline data assimilation approach is used, blending together information from an ensemble of transient atmospheric model simulations and observations. In the early period, ModE-RA utilizes natural proxies and documentary data, while from the 17^th century onward instrumental measurements are also assimilated. The impact of each observation on the reconstruction is stored in the observation feedback archive, which provides additional information on the input data such as preprocessing steps and the regression-based forward models. The monthly resolved reconstructions include estimates of the most important climate fields. Furthermore, we provide a reconstruction, ModE-RAclim, which together with ModE-RA and the model simulations allows to disentangle the role of observations and model forcings. ModE-RA is best suited to study intra-annual to multi-decadal climate variability and to analyze the causes and mechanisms of past extreme climate events.

Trends and variability in the Southern Annular Mode over the Common Era

Article Open access 22 April 2023

Water cycle changes in reanalyses: a complementary framework

Article Open access 23 March 2023

An ensemble of bias-adjusted CMIP6 climate simulations based on a high-resolution North American reanalysis

Article Open access 11 January 2024

Background & Summary

Analyzing longer time periods than what is covered by modern instruments helps to gain further insights into the climate system, its variability and understanding of historical climate events. However, to study the period before the availability of state-of-the-art instrumental measurements, e.g., to analyze the intra-annual dynamics of past climate changes, datasets with a high temporal resolution are required. Thanks to various initiatives, several new climate reconstructions of the Common Era (the past 2000 years)^1,2,3,4,5, most of them with annual resolution, have become available along with data compilations^6,7, and model simulations. For the last 600 years, there exist enough high-resolution data – natural proxies (e.g., tree rings, coral, ice cores), documentary evidence, and since 1658 early instrumental measurements – to attempt to generate a monthly global climate reconstruction.

To access climate variations in the past we can use modeling results and observations, but neither of the two sources can fully capture past climate variability. On the one hand, model simulations can only give a possible range of past climate states but the specific realization of random, internal variability that occurred in the real world cannot be reproduced. On the other hand, observational data become sparser in space the earlier the studied period is, in line with decreasing temporal resolution. Many recent global climate reconstructions use data assimilation to provide a best estimate (so-called analysis) of past climate fields by combining model simulations and observations^1,2,3,5,8. In this study, we employ an ensemble-based Kalman filter based data assimilation approach⁹ and use the newest data compilations with a new ensemble of atmospheric general circulation model simulations¹⁰ to derive monthly global atmospheric paleo-reanalyses from 1421 to 2008.

Previous monthly climate reconstructions generated by our group based on a similar data assimilation technique date back to 1601^2,5. Compared to these reconstructions, the amount of assimilated input data in this study largely exceeds the data used in previous products, we improved our data treatment strategy, and added a new method to assimilate short time series. Between 1421 and 1657 the observational network is exclusively based on proxy records and documentary evidence. The first long instrumental measurement series become available in Europe starting in 1658. With time the observational network becomes more and more dense, reaching a peak in the 1880s and 1890s, because after 1890 we do not add any new series to the network and many of these stations stopped measuring sometime in the 20^th century. The enlarged input database enables us to produce more robust reconstructions, for example over the winter seasons in the 17^th and 18^th centuries in the Northern Hemisphere due to the incorporation of documentary evidence and early instrumental data. The assimilated observations (temperature-sensitive and/or precipitation-sensitive proxy records, documentary evidences and early instrumental measurements over land and marine areas) have indirect or direct information about past temperature, pressure, precipitation, and wind. In addition to these variables, we reconstruct other climate fields of interest such as geopotential height and wind components on several vertical levels. Moreover, we provide a valuable observation feedback archive to be able to trace back the effect of the observations and at the same time to publish the input data. In order to quantify the effect of forced variability in the simulation on the reanalysis, we performed two reconstructions: one using an ensemble of transient model simulations as prior (ModE-RA), and one sensitivity experiment with random stationary priors (ModE-RAclim) (Table 1). A schematic overview of the implemented data assimilation system of ModE-RA is shown on Fig. 1.

Table 1 Summary of main features of the assimilation products and the model simulation.

Full size table

ModE-RA is systematically evaluated by comparisons with independent time series, gridded instrumental observations, and other climate reconstructions. Most paleo-reanalyses are based on non-transient priors from climate simulations. Their low-frequency variability is derived from the assimilated empirical data. In contrast, the centennial-scale variability in ModE-RA originates from the model response to forcings; therefore this dataset is best suited to study intra-annual to multi-decadal variability. Observations were transformed to anomalies from 71-year running means to reduce trends due to the changing input data network through time and because centennial-scale climate variability is not consistently preserved in many of the assimilated proxy records, for instance due to an inconsistent age trends removal in tree-ring width¹¹. ModE-RA shows good agreement with 20^th century gridded monthly products, for temperature (globally), sea-level pressure (especially in the Northern Hemisphere) and precipitation (in the regions where precipitation observations were assimilated). Comparing ModE-RA with gridded annual temperature reconstructions, we find the highest correlations with the summer mean (JJA) of ModE-RA. Evaluation against independent documentary evidence shows very similar values to the calibration results indicating a good performance of ModE-RA.

The overall goal of ModE-RA is to provide a multi-variate dataset that combines all available direct and indirect climate observations, with reconstructions of external forcings and the physics represented in climate models. In future studies, ModE-RA can be used to assess climate variability, historical climate events like volcanic eruptions and to examine past extreme events, such as several month long droughts, which may have no counterpart in the modern instrumental period.

Methods

Offline data assimilation

Data assimilation (DA) has been applied in multiple paleoclimatogical studies^1,2,3,12, making it possible to use sparse data with a continuously-varying observational network and, at the same time, to produce realistic climate fields in accordance with the model physics. Among the various DA techniques, ensemble-based Kalman filter methods have particularly been used in paleoclimate reconstructions^1,2,13,14. Most of the reconstructions of past climate are built on already-existing model simulations and do not propagate the analysis forward in time. This technique is known as offline DA. It has been previously argued that in the case of paleoclimate reconstruction, the predictability of the system is shorter than the temporal resolution of the observations, and no benefit was found from propagating the analysis forward if the focus is on land regions^13,15. A recent study¹⁶ showed that online DA can outperform offline DA when it is coupled with an ocean model. However, applying an offline DA method is computationally much cheaper than the traditional online techniques, and it allows for easy testing. Thus, we keep working with the offline approach.

Various ensemble-based Kalman filter approaches have been developed over the past few decades^17,18,19,20. We use the offline variant of the ensemble square root filter (EnSRF) DA approach⁹ to generate a 600-year-long global monthly atmospheric paleo-reanalysis (ModE-RA). Here, we briefly describe the update part of the EnSRF, in which the observation information is optimally combined with the model simulations. The EnSRF, like other ensemble-based approaches, uses the ensemble statistics to specify the Kalman filter equations. The update step of the EnSRF is divided into an update of the ensemble mean ($\bar{x}$) and an update of the deviations from the ensemble mean (${x}_{i}^{{\prime} }={x}_{i}-\bar{x}$):

$${\bar{x}}^{a}={\bar{x}}^{b}+K\left(y-H{\bar{x}}^{b}\right)$$

(1)

$${x}^{{\prime} a}={x}^{{\prime} b}-\widetilde{K}\left(H{x}^{{\prime} b}\right)$$

(2)

The background state vector (x^b) represents the state of the atmosphere before the assimilation of empirical data. In the case of offline DA, x^b is obtained from existing model simulations. x^a is the analysis, the updated state where observation and model information have been merged. H is the forward operator, which maps the model values to the observed data. K and $\widetilde{K}$ represent the Kalman gain matrix and the reduced Kalman gain matrix, which are calculated as:

$${\rm{K}}={{\rm{P}}}^{{\rm{b}}}{H}^{{\rm{T}}}{(H{{\rm{P}}}^{{\rm{b}}}{H}^{{\rm{T}}}+{\rm{R}})}^{-1}$$

(3)

$$\widetilde{{\rm{K}}}={{\rm{P}}}^{{\rm{b}}}{H}^{{\rm{T}}}{({(\sqrt{H{{\rm{P}}}^{{\rm{b}}}{H}^{{\rm{T}}}+{\rm{R}}})}^{-1})}^{T}\times {(\sqrt{H{{\rm{P}}}^{{\rm{b}}}{H}^{{\rm{T}}}+{\rm{R}}}+\sqrt{{\rm{R}}})}^{-1}$$

(4)

P^b is the background-error covariance matrix, and R is the observation-error covariance matrix. Although it is challenging to quantify the errors both in the model simulations and in the observations, it is fairly important to estimate them accurately, since the update of the state vector depends on their relation. If the observational error variance is smaller than the model error variance, then the state vector is more strongly shifted towards the observation value; otherwise, the observation will have little impact on adjusting the model states. In ensemble-based approaches, the background-error covariances are calculated from the ensemble members¹⁷. No observations are free from errors. Instrumental observations have various error sources^21,22,23, and estimating the error in documentary and proxy data is not straightforward either. How the errors of the different observation types are estimated is discussed in the Observational data section below. It is assumed that observational errors are uncorrelated. Hence, observational data can be assimilated one by one⁹, which further simplifies the assimilation scheme. Previous studies have shown that spurious long-range covariances can appear in the estimated P^b due to the small ensemble size. To avoid updates in the climate fields from distant observations, spatial localization is employed. Localizing P^b will limit how the observational information is distributed to all grid points as well as to the other variables in the state vector. The exact implementation of the offline EnSRF is given later in the Experimental design section.

Atmospheric model simulations

As an a priori state, we use ModE-Sim, an ensemble of simulations using the atmospheric general circulation model ECHAM6¹⁰. ECHAM6 was run at T63 spectral horizontal resolution (approximately 1.8° by 1.8°). ModE-Sim is designed to sample internal variability under given boundary conditions and radiative forcings while accounting for uncertainties in these. The ensemble consists of different transient monthly-varying forcings and boundary conditions that account for uncertainties in their reconstruction¹⁰. For the period from 1421 to 1850, ModE-RA is based on ModE-Sim set 1420-3, consisting of 20 members that use sea-surface temperature (SST) reconstructions²⁴ and climatological HadISST sea ice²⁵ as boundary conditions, and PMIP4 radiative forcings²⁶. For the period from 1851 to 2008, ModE-RA is based on ModE-Sim set 1850-1, which consists also of 20 members forced with HadISST2 SSTs and sea ice conditions and PMIP4 standard forcings. Concerning the mean state, ModE-Sim has the typical biases of ECHAM6 in stand-alone mode^10,27. The most prominent difference compared to observations is a warm bias in the Northern Hemisphere mid-latitudes. Another warm bias exists over Australia, and cold biases over South America, India, and the northern Rocky Mountains. Furthermore, the model is too wet over the Himalayas and the Andes. However, despite its mean state biases, ModE-Sim is able to capture the internal variability of our key variables over land regions¹⁰. The performance is particularly good on seasonal-to-interannual timescales with the limitation that the ensemble spread is slightly too large in many regions¹⁰.

The 20 ensemble members were preprocessed by calculating the ensemble mean from the 20 ensemble members, which in the next step was transformed to 71-year monthly running climatologies (35 years before and 35 years after the current year when possible). These climatologies were then subtracted from each member to obtain the 71-year running anomalies in agreement with the assimilated anomalies. These anomalies form the x^b state vector in the DA framework.

Observational data

In order to get the most complete information about the Earth’s climate in the last 600 years from observations, we utilize several data sources such as proxy records, documentary data, and instrumental measurements. Prior to the widespread availability of surface instrumental measurements, past climate changes can be reconstructed from proxies and documentary data. Natural proxy records are available for the full reconstruction period and instrumental measurements exist since the mid-17^th century. The temporal availability and spatial distribution of the different data types are shown in Figs. 2, 3.

Natural proxy records

Paleoclimatic proxy records provide the earliest information for our climate reconstruction from many regions of the world. Particularly useful for our focus on intra-annual to multi-decadal variability are seasonally-to-annually resolved records. The proxy network used in this study consists of several previous data compilations: PAGES2k data base⁶, OCEAN2k data base^28,29, ISO2k data base³⁰, N-TREND³¹, Briffa/Schweingruber³², Breitenmoser 2014³³, and Neukom 2014³⁴; and involves the assimilation of bivalve, coral, ice, lake sediment, speleothem, and tree proxy records. The PAGES2k data base contains 692 temperature-sensitive records; the major archive type is trees with ring width (TRW) as the most common variable. The OCEAN2k data base is a collection of 57 coral proxies that recorded variations in sea surface temperature. The ISO2k data base includes 759 δ¹⁸O and δ¹⁸H isotope records both from continental and marine regions. The N-TREND data base contains 54 tree-ring records from the Northern Hemisphere. The Briffa/Schweingruber dataset is based on 387 tree-ring Maximum Latewood Density (MXD) chronologies; however, we use the version which has been gridded to 5° × 5° in longitude and latitude and consists of 115 grid boxes³⁵. The Breitenmoser 2014 dataset contains 2287 consistently-processed chronologies from the International Tree Ring Data Base. The Neukom 2014 dataset includes 48 marine and 277 terrestrial paleoclimatic records from the Southern Hemisphere.

In our proxy network, most records for the last millennium come from tree rings, which have one time-integrated observation per year, primarily representing the growing season. To incorporate time-integrated information into our monthly climate reconstruction, we choose a half-yearly assimilation window. We use boreal winter (October of year-1 to March), which is the Southern Hemisphere growing season, and boreal summer (April to September), which is the Northern Hemisphere growing season. The other proxy types with annual resolution have been adapted to best fit into this scheme. For instance, ice cores may record a signal during the season of strongest precipitation. Hence, we test the proxy against the climate signal from both seasons. As a result, such non-tree ring proxies can be recorders of none, one, or both seasons.

As mentioned above, we assimilate anomalies. In the first step, all proxy data are high-pass filtered by transforming them to 71-year running anomalies, i.e., each anomaly is calculated with respect to the 35 years before and 35 years after the current year when possible. This transformation is required because low-frequency variability is not retained in many TRW records whose initial goals have not been to reconstruct climate^36,37,38. In the following step, all proxy records are standardized by subtracting their mean and dividing by their standard deviation. Finally, we create multiple linear regression models for each proxy, which serve as a forward operator H in the assimilation. The regression coefficients are estimated based on at least 30 years during the period 1901 to 1970 using the HadCRUT5³⁹ and GPCC v2020⁴⁰ gridded instrumental datasets for temperature and precipitation, respectively. Each month of the growing season serves as an independent variable in the regression model. We only allow for consecutive months to influence the proxies, e.g., tree-growth can depend on May-June precipitation and June-July-August temperature but not on June and August temperature without July. Based on the given information in the corresponding datasets’ publications, we apply different H forward models. See Table 2 for specific details on the proxy forward models. We test 15 temperature-only models and/or 15 precipitation-only models and/or 225 mixed (both temperature and precipitation) models per proxy record. That adds up to a theoretical maximum of 255 models (combinations of months). From these, we identify the best model based on the Akaike Information Criterion (AIC). The error of the proxy record is estimated from the regression model residuals.

Table 2 The possible configurations of the proxy forward operator in the different data compilations.

Full size table

The records in our proxy network furthermore must have a significant climate signal. Therefore, we require the p-value of the F-Test for the regression model to be statistically significant (p < 0.05) and set a threshold for R² > 0.1. Additionally, we have visually inspected the time series of all proxy data and the match of the forward modeled proxy data. In this procedure, we identified and excluded a few records that only exhibit multi-decadal variability despite having annual values. In the final step, duplicates have been removed since proxy records can be part of several databases. Sometimes duplicates may not have identical values because there are multiple versions of the same record, for instance, after different statistical treatments like age-trend removal. Therefore, we keep the best (highest R²) and longest record within a 10 km radius.

Documentary data

Documentary data encompass climate information from historical documents. We use them in the form of indices (temperature, precipitation, wetness/dryness) or phenological data (flowering dates, harvest dates, river or lake freezing and thawing). Documentary data have been used for regional climate reconstructions^41,42,43, but only a few records have so far been included in global datasets such as the PAGES2k data base⁶. A new global documentary data collection was published recently, combining document-based climate evidence from all around the world for the past 600 years, comprising ca. 600 series⁷. Documentary data have a unique importance in climate reconstruction because they have higher temporal resolution than natural proxy records, often monthly, and they cover combinations of seasons and regions (e.g., winter in East Asia) that are otherwise not well covered with natural proxy data. From the ca. 600 available time series in the global documentary data collection, we removed those that did not extend further back than the neighboring meteorological station that would have to be used for calibration. We assimilate phenological data such as flowering date, grape harvest date, leaf coloring date, and freezing or thawing dates of rivers or lakes (or related information such as the first and last ship in a port); specific indices such as sea ice severity, temperature or precipitation indices, and wetness or dryness indices; as well as indices that were already converted to a temperature or precipitation series. Moreover, we used various wind proxies based on marine observations of wind direction. Some of the documentary series in Europe and Asia reach back to the year 1421, to 1722 (or 1640 if we consider Guatemala rainfall index) in North America, to 1540 in South America, and to 1796 (or 1750 if we consider West African Monsoon Index) in Africa.

In order to assimilate documentary data, a forward operator (H) needs to be defined. Like for the natural proxies, this was done by calibrating a multiple linear regression model based on gridded climate data or station data and is described in detail in the documentary data collection paper^7,44. All necessary details on the forward model are given in the observation feedback archive files. In short, we use monthly data of the main driving variable (either temperature or precipitation). If an index covers a fixed season or month, this season (seasonal average) or month was chosen, and the forward operator then simply extracts this month or season from the state vector. For phenological dates, we also allowed months prior to the event in question to enter the model. We therefore included all previous months back to the start of the assimilation window. In a backward selection approach, we then retained only significant months (p < 0.1) but kept insignificant months between two significant months to obtain a consistent model. That differs from Burgdorf et al. (2023) only insofar that in the latter paper, all six months prior to a phenological event (defined as the month in which the 90th percentile of the time series lies) were included, whereas in our approach this was restricted to the months within the six-month assimilation window of ModE-RA. As a consequence, there are a few instances in which a series (e.g., spring thawing date) appears both in the boreal winter and in the boreal summer assimilation, as both models individually are significant. Another difference was that all models were calibrated based on anomalies from the 71-year moving average, as they will be assimilated in this form. In a few cases, highly non-Gaussian variables were transformed prior to calibration^7,45,46.

The models were calibrated in an overlapping period that was ideally 70 years long and as early as possible, but we allowed for shorter periods down to 20 years. If the overlap was too short, we standardized the documentary record and scaled it with the standard deviation of the climate field data. The regression coefficients of the final model were then used as coefficients of the forward operator, and the variances of the residuals of the regression were used as an estimation of the observational error.

Instrumental measurements on land

Early meteorological measurements have a key role when studying past climate variations. A new comprehensive data compilation (HCLIM), with the focus on the early instrumental period, includes global, regional, and national databases as well as various other datasets and newly digitalized time series⁴⁷. The HCLIM database contains early instrumental time series of global air temperature measurements (3633 series) between 1658–2021, monthly global atmospheric pressure measurements (807) between 1658–2020, monthly global precipitation measurements (4944) between 1677–2021, and monthly number of wet days (3072) between 1586–2019.

The records included in the HCLIM database are quality controlled but not homogenized. However, breakpoints and merging information are provided. We used the breakpoints to prepare the instrumental measurements for the assimilation. The time series were split where breakpoints are indicated and then divided into three categories: (1) long time series, more than 50 years long, (2) medium length series, between 5 and 50 years; and (3) short time series, less than 5 years long. Therefore, while HCLIM only comprises time series starting before 1890, some of the time series resulting from the splitting will have a later starting date. We chose a simple H forward operator to assimilate the measurements, which only extracts the correct month from the state vector. In addition to the forward operator, the errors in the measurements also need to be provided. We estimated the errors following the methodology of Wartenburger et al.⁴⁸. In some cases, the error had to be inflated, in particular when the monthly means were calculated as the average of the monthly extremes and when they referred to the Julian calendar. For these cases, we used nine sub-daily series from the Palatine Society network⁴⁹ to estimate the inflation. The estimated errors of the measurements are summarized in Table 3. The long time series are transformed to anomalies with respect to 71-year moving climatologies, similar to the proxy records and documentary data, while the medium-length and short series are first left unchanged. Furthermore, if several series of the same instrumental type belong to the same grid cell in the same year, only the ones belonging to the longest category are assimilated.

Table 3 Estimated instrumental errors. k = (1013.25 hPa - p_mean)/100 hPa, where p_mean is the average of the whole series and k is used to linearly increase the error with elevation.

Full size table

Pressure measurements from ships

Marine near-surface measurements provide information from many regions where otherwise we have no other data sources. We use the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3.0 to assimilate pressure measurements⁵⁰. Nighttime marine air temperature and SST measurements have been already incorporated into the SST reconstruction²⁴ used as boundary condition in ModE-Sim¹⁰. ICOADS is a 2° × 2° gridded dataset with monthly resolution starting in 1800. In addition to the monthly values, important metadata (e.g., number of observations per month) are also available. We use the number of observations per month to decide whether the given monthly value per grid cell is a good estimation of monthly means. The number of days with observations can come from multiple ships crossing the same grid box on the same days. However, in the earlier times, it is very unlikely that many ships crossed a grid box on the same day of the month. We set 10 observations per month as a minimum number of observations to use the data as a monthly mean. Additionally, we mask all grid cells that never met the 10 observations per month criterion at any time step until 1890. That leaves us with wide coverage because many grid cells have some observations during the 1880s. Moreover, we further thinned the amount of marine pressure measurements due to the correlated errors in the case of moving ships. In a raster of 3 × 3 grid cells, we only keep the observation at the location where the number of observations per month is the highest.

The H forward operator follows the same principle as for land measurements; that is, extracting the closest grid cell for a given month from the state vector. Observational error variances are estimated to vary between 3 hPa, which has been used for land stations in case of hundreds of observations per month and 10 hPa in the case of only 10 observations per month. We used an equation similar to what has been used previously for estimating observational error of ship measurements²⁴:

$${\sigma }_{error}^{2}={\sigma }_{systematic}^{2}+{\sigma }_{random}^{2}/\sqrt{n}$$

(5)

where, the systematic error variance (${\sigma }_{systematic}^{2}$) is approximated with 3 hPa, the random error variance (${\sigma }_{random}^{2}$) with 22 hPa and n denotes the number of observations. We could not calculate the 71-yr moving climatologies because of the time gaps at the grid cells, especially in the 19th century. Instead, there is just one climatology for the entire 20^th century (Jan 1900 to Dec 1999). It consists of 12 values, one per month, for all grid cells. These climatologies are subtracted from the absolute values to create the anomalies for the assimilation.

Experimental design

In our DA framework we work with anomalies. As already described above, both the model simulations (x^b) and long observations (y) are transformed into anomalies, except for the ship pressure measurements, where anomalies are calculated relative to the 1900–1999 climatology. Since we only assimilate anomalies, the centennial-scale variability in our dataset is influenced by the reaction of the model to prescribed forcings, whereas annual-to-multi-decadal variability is improved based on the assimilated observations. By working with anomalies, we do not correct for existing biases in the model simulations and the reconstructions remain consistent in the model world. This has the advantage of not introducing artificial trends due to temporal changes in the observation data availability.

The implementation of offline DA approaches and the estimation of the background-error covariance matrix vary among the different paleoclimate reconstructions. One offline DA method assumes a time-invariant P^b^1,3, while in others P^b is forcing-dependent (e.g., amount of volcanic aerosols, ENSO state, etc.) and is recalculated from the precomputed model ensemble for this specific time step². In a further study, time-invariant and transient covariance matrices are combined to offset the relatively small sample size of transient simulations and thereby obtain a better estimation of the errors⁵. Here, we also use a blending technique to calculate the background errors to compensate for the small ensemble size. The blended background errors (P^blend) are calculated by combining a climatological background-error covariance matrix P^clim with the transient one (P^b) as:

$${{\bf{P}}}^{{\bf{blend}}}={\beta }_{1}{{\bf{P}}}^{{\bf{b}}}+{\beta }_{2}{{\bf{P}}}^{{\bf{clim}}}$$

(6)

where P^clim is derived by randomly selecting 100 climate states from the 590 year-long transient model simulations (20 ensemble members * 590 years) for each assimilation window. Based on previous results⁵¹ the weights (β₁, β₂) of the covariance matrices set to be equal. Both β₁ and β₂ are 0.5. Hence, in our DA scheme, the P^b is replaced with P^blend in Eqs. 3, 4.

As mentioned above, the impact of an observation on x^b is often limited by localizing the background-error covariance matrix. We implement a widely used localization technique, where each element of the error covariance matrices is multiplied with the respective element of a distant-dependent correlation function. We use a Gaussian localization function as:

$$G={\exp }\left(-\frac{{z}^{2}}{2{L}^{2}}\right)$$

(7)

where the distance between two grid boxes is denoted by z, and L stands for the length-scale parameter. We apply a stricter localization on P^b, which is calculated from 20 ensemble members than on P^clim, which we calculate from 100 members. The applied length-scale parameters are summarized in Table 4. We also set values to 0 where the distant-dependent correlation is <0.001.

Table 4 Length-scale parameters used in the localization of P^b and P^clim matrices. The values are given in km.

Full size table

In order to be able to track the impacts of observations, we provide an observation feedback archive in which metadata and information about the preparation of the input data are given. The metadata and preparation information are the following: observation and record ID, the name, the coordinates, as well as the altitude. Additionally, we provide the year in which the observation was taken, the year in which it is assimilated, the season, the data type, variable and the unit, the original and transformed value (e.g., interpolation from Julian to Gregorian calendar in early instrumental data or standardization in case of documentary and proxy data,) as well as the 71-year moving climatology and anomaly, the reference period and reference dataset, the inherited quality check flags, the observational error variance, to which cycle the observation belongs and the coefficients of the H forward operator. Regarding the assimilation, for each record we store the flags of the background check and whether the record is assimilated, the length-scale parameters, the background and analysis departure in the same grid cell from each member, the ensemble mean and from the 71-year model climatology, and a bias term of medium and short observations.

In our setup H is always linear and when necessary calibrated in the preprocessing steps and stored in the half-yearly (previous year’s October to March and April to September) observational input files together with the observational error variances. The x^b state vector is built from 6 monthly fields of several variables. Therefore, we not only apply a localization in space but also in time; that is, we allow the observations to update the monthly fields based on the H forward operator. In the case of monthly instrumental measurements, these will have an influence only on one month. Seasonal documentary and proxy data, however, can affect several months. After preparing the observations and the monthly anomalies of the ensemble members, in the first step we assimilate all proxy records, documentary data and marine measurements as well as long (>50 years) instrumental measurements. Before a non-marine observation is assimilated we check first whether the observation is within the ±5 range of the square root of the sum of model and observation variances $\left(\sqrt{{\boldsymbol{H}}{{\bf{P}}}^{{\bf{b}}}{\boldsymbol{H}}+{\bf{R}}}\right)$ from the model ensemble mean. In the case of marine observations a stricter filtering was implemented because we wanted to give less weight to pressure measurements from ships. Therefore a marine sea-level pressure record has to be within a ±2 range of the model ensemble mean. If an observation does not pass the prescribed quality check, it is assimilated passively; that is, the climate state is not updated by the observation but its potential impact is stored in the observation feedback archive file. Furthermore, instrumental measurements of the same type are averaged if more than one observation can be found in the same grid box, and then their average is assimilated.

The assimilation is paused after assimilating all proxies, documentary data, and instrumental records longer than 50 years in the first cycle (cycle1). The 5 to 50 year-long instrumental time series are debiased before they are assimilated in the second cycle (cycle2) (Fig. 1). For debiasing we use a simple approach of fitting the first two harmonics to the annual cycle of the observation and the ensemble mean of cycle1 analysis. The annual cycles of the cycle1 analysis and the medium-length time series (belonging to cycle2) are calculated from the months where measurements are available. In most cases there is a good agreement between the annual cycle of the cycle1 analysis and the medium-length observations in the temperature and pressure time series, and the harmonic fits approximate well the annual cycles. The annual cycles of number of wet days and precipitation calculated from the cycle1 analysis and the observations agree less, and the harmonic fits follow less the month-to-month variations. In cycle2, the time series are transformed to anomalies as:

$${x}_{anom}={x}_{orig}-bias-mode{l}_{clim}$$

(8)

where x_orig is the original time series, bias is the monthly bias between the fitted annual cycles and model_clim is the 71-year running climatology of the model. From this point, the observation goes through the same steps (quality check, averaging) as in cycle1, before it is assimilated. In cycle2 the model simulations are replaced with cycle1 analysis as the background state and the covariance matrices are also recalculated from them. After assimilating all observations in cycle2, we pause again the assimilation and the biases in the short records (<5 years) are calculated similarly as for cycle2 observations, fitting the harmonics to the ensemble mean of cycle2 analysis and the observations. Then using Eq. 8 the short records are transformed to anomalies and assimilated. The cycle3 analysis is the final product, the ModE-RA paleo-reanalysis.

In addition to the ModE-RA paleo-reanalysis, we generated another reanalysis ModE-RAclim to allow the user to disentangle the effect of observations and boundary conditions (forcings) on ModE-RA. For ModE-RAclim the members of the x^b state vector are randomly selected from the 589 year-long 20 transient ModE-Sim simulations. We use 100 randomly selected members at each half year in the assimilation from which the P^b background-error covariance matrix is calculated. This approach is similar to other reconstructions^1,3 with the exception that they always use the same x^b and P^b and work with absolute values. Note that the random sampling used for the ModE-RAclim prior largely averages out the forced signal arising from the forcings and boundary conditions in ModE-Sim and the assimilated observations have been high-pass filtered. Variability at scales longer than 71 years is not present in this sensitivity experiment. For the localization of the background-error covariance matrix the larger length-scale parameters were employed (Table 4, right column). Observations are handled as in the ModE-RA assimilation, assimilating them in three cycles.

Data Records

Most of the input datasets are publicly available and can be downloaded from the following data repositories: PAGES2k records can be found on figshare⁵², Ocean2k⁵³ and Iso2k⁵⁴ can be downloaded from NOAA/WDS Paleoclimatology; the N-TREND dataset can be found on the project website⁵⁵; the gridded Briffa/Schweingruber tree-ring dataset can be downloaded from the Climatic Research Unit⁵⁶; the processed tree-ring chronologies by Breitenmoser 2014 were used in previous climate reconstruction and are available within the EKF400 project⁵⁷; DOCU-CLIM is accessible on BORIS⁵⁸; HCLIM can be downloaded from PANGAEA⁵⁹; ICOADS data provided by the NOAA PSL, Boulder, Colorado, USA, from their website (https://psl.noaa.gov). The record used from the Neukom 2014 dataset is included in the observation feedback archive and uploaded to WDCC (https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501).

Technical Validation

The quality of the ModE-RA paleo-reanalysis is assessed by comparisons with multiple data sources, depending on their availability through time. We start with comparisons to gridded instrumental datasets in the 20^th century. Because these are not fully independent and the quality of ModE-RA changes through time, we additionally compare ModE-RA to the 20^th Century Reanalysis version 3⁶⁰ (20CRv3) and mostly proxy based reconstruction^1,3,4,42. Finally, we compare ModE-RA to completely independent documentary information. As metrics for evaluation, we use the correlation and the root mean square error skill score (RMSESS). The RMSESS is calculated as:

$$RMSESS=1-\frac{\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({x}_{i}^{u}-{x}_{i}^{ref}\right)}^{2}}}{\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({x}_{i}^{f}-{x}_{i}^{ref}\right)}^{2}}}{\rm{,}}$$

(9)

where we replace x^u with the ensemble mean of the paleo-reanalysis (ModE-RA) (${\bar{x}}^{a}$); x^f is the ensemble mean of the model simulations (ModE-Sim), i.e., this skill score measures the improvement in comparison to ModE-Sim, which already has some reconstruction skill because it follows the external forcings and boundary conditions. (${\overline{x}}^{b}$). The time step is denoted with i; we calculate the RMSESS over the 1901–2000 period. For the 20^th century validation, we compare the ensemble mean of ModE-RA against three reference datasets for temperature, precipitation, and sea level pressure (x^ref): HadCRUT5³⁹, GPCC version 2022^14,61, and HadSLP2⁶². These datasets are not entirely independent from the paleo-reanalysis, but only time series starting before 1890 (before the breakpoint detection) were assimilated in ModE-RA. The skill metrics are calculated using anomalies relative to the 1961–1990 period. All three validation datasets were remapped to the spatial resolution of the model simulations. Although ModE-RA has monthly resolution, we present half-year averages in the evaluation because of similar absolute values and very small differences between single months (see Figs. S1–S3). Differences between the two half-years are more relevant because many proxies, which represent a growing season, have only been assimilated in one of the half-year assimilation windows.

Temperature correlation between ModE-RA and gridded instrumental dataset are globally high (Fig. 4a,d). Note that the HadCRUT5 dataset is a blended product and uses the HadSST4 sea-surface temperature dataset⁶³ which is also one of the boundary conditions of ModE-Sim; therefore the high correlation values over oceans are expected (Fig. 4c,f). Correlations are also mainly positive over land between ModE-Sim and HadCRUT5 due the 20^th-century warming trend, but after assimilating the observations an almost perfect correlation is achieved especially in the Northern Hemisphere and Australia (Fig. 4a,d). For pressure, ModE-Sim has the strongest correlation with HadSLP2 over the tropical ocean (Fig. 5c,f). The assimilation improves the sea-level pressure fields the most over the Northern Hemisphere, in southern South America and a larger region around Australia (Fig. 5a,d). In contrast to temperature and sea level pressure, where the forcings and boundary conditions introduced positive correlations in ModE-Sim, correlation coefficient of ModE-Sim for precipitation are close to zero (Fig. 6c,f). Hence, most information is gained by the assimilation (Fig. 6a,d). Correlation coefficients of precipitation improved in the regions, where most of the precipitation data were assimilated because of the narrower localisation length scale (Table 4).

In addition to analyzing the performance of ModE-RA, the skill of ModE-RAclim is also shown on the figures (Figs. 4b,e, 5b,e, 6b,e). The reconstructed temperature fields of ModE-RAclim have similar skill over land where the skill comes from data assimilation. ModE-RA outperforms ModE-RAclim over marine regions where the skill comes from the SST boundary conditions. The results of the sea-level pressure reconstruction of ModE-RAclim yields again similar skills to ModE-RA, but ModE-RA has higher skill over the tropical Pacific where the information comes again from the SST boundary conditions. The skill of the ModE-RAclim precipitation field is comparable to ModE-RA because ModE-Sim has correlations close to zero and most information is gained by the assimilation.

To evaluate the skill of ModE-RA compared to the model simulation, the RMSESS is calculated by using the ensemble mean of ModE-Sim in the denominator (x^f in Eq. 9). With the help of the RMSESS metric we can gain further insights into how well the amplitudes of climate variability are reconstructed. The temperature fields show a notable improvement over ModE-Sim in the extra-tropical land areas of the Northern Hemisphere, the southern part of South America, and Australia (Fig. 7a,b). Sea-level pressure is better reconstructed in the October to March half year than in April to September. The largest increases in the RMSESS can be found over the North Atlantic, Europe, and the Indian Ocean (Fig. 7c). The reconstructed precipitation fields show improved skill mainly in the regions where precipitation measurements were assimilated in both half years (Fig. 7e,f).

The 20^th century validation highlights the quality of ModE-RA at the end of the 19^th century because no stations that started measuring after 1890 were included anymore. As seen in Fig. 2, the number of assimilated observations decreases back in time and simultaneously the spatial coverage and density shrinks. The ensemble standard deviation ratio (standard deviation of the ModE-Sim ensemble divided by the standard deviation of the ModE-RA ensemble) highlights where the assimilation added information. This remaining ensemble standard deviation after data assimilation in comparison to the prior standard deviation for six time slices between 1450 and 1950 and for two months is shown in Fig. 8 (January: a,c,e,g,i,k and July: b,d,f,h,j,l). There are few observations assimilated in the October to March season before the year 1500. First observations for the boreal winter season become available in the 16^th century. In the 18^th century, the spatial distribution extends into eastern North America. In contrast, there is already a dense proxy network in the northern hemisphere extra-tropical land region during the April to September season in the year 1450. This is also further increasing through time. Assimilated information in the Southern hemisphere before the year 1800 is sparse. Nevertheless, ModE-RA includes more observations from the 19^th and early 20^th century than any other currently existing gridded global dataset for data sparse regions such as South America or Africa.

To further evaluate ModE-RA beyond the 20^th century, we compare regional-scale variations of the ModE-RA ensemble mean with other reconstructions. Several annual global temperature field reconstructions based on natural climate proxies exist over the Common Era. In a recent study, annual global temperature was generated with six different climate field reconstruction techniques (Analogue method (AM), Canonical correlation analysis (CCA), Composite plus scaling (CPS), Data assimilation (DA), GraphEM (GEM), and Principal-Component Regression (PCR)), all using the same multiproxy input data⁶ over the Common Era⁴. Additionally, we compare ModE-RA to the global temperature reconstructions with annual resolution from the Last Millennium Reanalysis (LMR), produced with a similar offline DA method and assimilating multiproxy records¹. A reconstruction of paleo-hydroclimate (PHYDA) - reconstructed with an offline DA method using proxy records - provides both annual and seasonal climate fields of several variables³. Furthermore, a monthly temperature reconstruction for European land areas is also available starting in 1659, which is based on multiproxy records, documentary evidence, and instrumental data (hereafter L2004⁴²).

For the pre-20^th century, we focus on the European land area (25°W–40°E and 35°N–70°N, as defined in L2004). The correlation was calculated at each grid cell between the reconstructions and the ensemble mean of ModE-RA in the overlapping period after the reconstructions were remapped to the resolution of ModE-RA. In the six reconstructions⁴ a year is defined as April to March, similar to the annually resolved PHYDA reconstruction³. Therefore, annual values from ModE-RA and L2004⁴² were calculated correspondingly from the monthly April to March data. Correlation between LMR and ModE-RA in the annual comparison is calculated between January and December. All annual reconstructions based on multiproxy records mainly use information from tree rings. These primarily provide information for the summer season. Hence, we calculated the correlation between the annual reconstructions and seasonal temperature field from ModE-RA using the months June, July and August (JJA). When the temperature reconstructions are available with higher temporal resolution such as in the case of PHYDA and L2004, we use them to calculate the seasonal correlation. Annual mean correlations between ModE-RA and the other reconstructions are rather low with the exception of L2004 (Fig. 9). When the correlations are calculated between the annual reconstructions (AM, CCA, CPS, DA, GEM, PCR, LMR) and the seasonal summer mean of ModE-RA, the correlation coefficients are higher (Fig. 9), suggesting that previous annual temperature reconstructions are biased towards the summer seasons. The correlation is higher between the boreal summer PHYDA reconstruction and ModE-RA than the annual reconstruction. In contrast, L2004 shows weaker correlations in the boreal summer than in the annual mean comparison (Fig. 9). The reason behind a weaker correlation in boreal summer may be that the annual mean is dominated by winter variability.

Only few datasets have monthly or higher resolved information in the period before the year 1900. Besides L2004, 20CRv3 spans over the 1836–2012 time period and was generated with a DA technique using pressure observations and a weather forecast model⁶⁰. A more experimental phase of 20CRv3 between 1806 and 1835 is also available. Here, we use L2004 and 20CRv3 to compare the area-weighted monthly temperature anomalies, i.e., after removal of the annual cycle and relative to 1821–1850 over the European land areas in the preindustrial period. The correlation coefficient between ModE-RA and L2004 European average temperature over the 1659–1850 period is 0.27, and over the 1806–1850 period is 0.28. The correlation between ModE-RA and 20CRv3 over the 1806–1850 period is 0.67. We do not expect high correlations because L2004 is a reconstruction based on a smaller dataset with a purely statistical principal component regression method and in 20CR only surface pressure is assimilated and no temperature information.

Additionally, we calculated the correlation between the 500hPa geopotential height in the ModE-RA ensemble mean and the 550hPa geopotential height in 20CRv3 from 1841 to 2000 for each decade. These fields are only updated through the data assimilation framework because no upper air observations were assimilated. The correlations are calculated based on monthly values relative to the 1961–1990 climatologies, separately for the Northern and the Southern Hemisphere using the ensemble means. In general, there is better agreement between the ModE-RA and 20CRv3 over the Northern Hemisphere with the median of correlations strongly increasing until the 1911–1921 decade then the changes become relatively small from decade to decade (Fig. 10). The agreement between ModE-RA and 20CRv3 in the Southern Hemisphere increases throughout the examined period with a slowdown from 1901 to 1950.

Furthermore, we used 71 independent documentary records (Table S1) from the documentary data compilation⁷ to evaluate ModE-RA in the earlier times. The independent documentary data are first calibrated with observation-based datasets as described in documentary data compilation paper⁷, and then these models are applied to ModE-RA for estimating the documentary observations. We found strong correlations between the independent data and the ensemble mean of ModE-RA, except for a few series, especially in North America (Fig. 11a). Comparing the correlations of ModE-RA with the calibration statistics, they are almost on a 1:1 line (Fig. 11b).

The observation feedback archive can also be used to analyze individual cases. Here, we present three examples of extremely positive and negative Palmer drought severity indices (PDSI)⁶⁴ calculated using the temperature and precipitation fields from ModE-RA. The PDSI calculation requires three inputs: potential evapotranspiration (PET), precipitation, and latitude. We estimate PET with the Thornthwaite equation⁶⁵, which is solely based on monthly mean temperature, in our case from ModE-RA. This PET estimate is then used together with monthly mean precipitation sums to calculate PDSI. The original version of PDSI was calibrated to conditions in central North America. Here, we use the self-calibrating version of PDSI, which adjusts parameters in the PDSI calculations to local conditions around the world⁶⁶. We compared the PDSI of ModE-RA with two mainly tree-ring based reconstructions, with PHYDA³ and the Old World Drought Atlas (OWDA)⁶⁷. All three examples represent the boreal summer season.

1.
In the summer 1789 in Central Europe, the PDSI values are negative in PHYDA and OWDA, while they are positive in ModE-RA. In the observation feedback archive we found that assimilated early instrumental precipitation measurements and information about the number of wet days per month for the region around 9°E and 49°N point to a strong wet anomaly. This is confirmed by a negative pressure anomaly in early instrumental data. In this case, there is strong support that ModE-RA can be trusted.
2.
In the year 1780, ModE-RA shows a strong wet anomaly in the region of Inner Mongolia, around 101° E, 40° N. In the observation feedback archive we found only one precipitation-sensitive tree-ring record in the area at that time. It turns out that the strong positive precipitation anomaly, as well as a strong cold anomaly, are already present in ModE-Sim, which was not significantly modified by the assimilation of this single tree-ring record. We could not identify an external forcing which could be responsible for this anomaly. One possible explanation could be that all members of our relatively small ensemble are coincidentally too moist in this case. Such insights can be gained from the comparison of the model simulations with the paleo-reanalysis.
3.
Finally, a drought in summer 1540 in Central Europe is reconstructed by both the OWDA and PHYDA, although with different intensities. PHYDA suggests a moderate drought whereas OWDA points to more extreme conditions. In ModE-RA, we assimilated documentary data in addition to the tree-ring proxies. These documentary indices show strong negative precipitation anomalies and positive temperature anomalies⁶⁸. Due to the assimilation of this additional documentary evidence, ModE-RA PDSI supports the stronger drought conditions reconstructed by the OWDA.

Usage Notes

ModE-RA (ensemble members and statistics) and ModE-RAclim (ensemble statistics) are uploaded to NOAA and to the World Data Center for Climate (WDCC) at Deutsches Klimarechenzentrum in Hamburg, Germany (https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501 and https://doi.org/10.26050/WDCC/ModE-RAc_s14203-18501)⁶⁹. The two climate reconstructions are in NetCDF4 format; the NetCDF4 files cover the whole period per variable. Ensemble statistics include the mean (monthly anomalies with respect to the period 1901 to 2000), maximum, minimum and spread in terms of one standard deviation from the ensemble mean. The observation feedback archive files are available in tsv format (one file per 6 months), which contain all relevant information of the input data, how the input data were processed, and useful feedback information from the DA system. A detailed list of all information stored in the feedback archive has been published with the dataset: https://www.wdc-climate.de/ui/entry?acronym=ModE-RA_info. We provide an online visualisation tool to plot maps, timeseries and the locations of assimilated data: http://mode-ra.giub.unibe.ch/climeapp.

The ModE-RA paleo-reanalysis is identical to the ModE-Sim simulations⁷⁰ in areas far away from any assimilated observations, especially at the beginning of the reconstruction period. With time, more and more observations are available, suggesting that the reconstruction becomes more skillful. Therefore, the users first should ensure how reliable the paleo-reanalysis is for a given region and time period. This can be achieved by looking at the ensemble spread and the differences between ModE-Sim and ModE-RA. Among the reconstructed variables, the ones with observational input data are the most realistically estimated. We encourage the users to make use of the ensemble members and not only the ensemble mean.

ModE-Sim was generated in two phases (1420–1850 and 1850–2009) with different boundary conditions¹⁰. In the earlier period, ModE-RA is based on ModE-Sim Set 1420-3, and in the later period on ModE-Sim Set 1850-1. ModE-RA is not split into the two periods of the ModE-Sim prior because the assimilated observational time series lead to a smooth transition between the two periods of the ModE-Sim sets.

ModE-RA was generated by transforming both model simulations and observations to 71-year running anomalies. Hence, users should be aware that the centennial-scale variability is the model response to forcings. Therefore, we see great potential for future research, particularly in terms of intra-annual to multi-decadal variability. We provide monthly anomalies with respect to the 1901 to 2000 climatology and the model climatology for the 1901 to 2000 period. Be aware that the model climatology includes model biases. Therefore, we recommend using anomalies instead of absolute values.

Furthermore, because of the employed setup, unrealistic values (such as negative precipitation) can occur if absolute values are generated by adding back a climatology. This is especially an issue in arid regions where monthly precipitation is not normally distributed. Precipitation is consistent in the periods of 1421–1800 and 1900–2009 when the observational network is quite stable, but in the 19^th century, when many of the observation time series start, a trend is introduced in some arid land regions and tropical oceans (Fig. S4). Hence, in the case of the reconstructed precipitation fields, the early and late period should be looked at separately.

ModE-RAclim should be seen as a sensitivity study and is only a side product of the project. ModE-RAclim does not contain centennial scale climate variability. For most users, the main product ModE-RA therefore should be used for regular studies on past climate. The main differences between ModE-RAclim and ModE-RA are on the model side: ModE-RAclim uses 100 randomly picked years from ModE-Sim as a priori state. Thereby, stationarity in the covariance structure is assumed, and the externally-forced signal in the model simulations is eliminated. In combination with ModE-Sim and ModE-RA it can be used to distinguish the forced and unforced parts of climate variability seen in ModE-RA.

ModE-RA makes use of several data compilations and assimilates various direct and indirect sources of past climate compared to 20CRv3. Hence, if monthly resolution is sufficient for the planned study, ModE-RA may have higher quality already from 1850 backwards to analyze past climate changes and can be viewed as the backward extension of 20CRv3.

Code availability

The R code for the quality control of the observational data and their assimilation ModE-RA ensemble can be found together with the entire ModE-RA dataset and observation feedback archive at the World Data Center for Climate (WDCC) at Deutsches Klimarechenzentrum in Hamburg, Germany (https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501).

References

Hakim, G. J. et al. The last millennium climate reanalysis project: Framework and first results. Journal of Geophysical Research: Atmospheres 121, 6745–6764, https://doi.org/10.1002/2016JD024751 (2016).
Article ADS Google Scholar
Franke, J., Brönnimann, S., Bhend, J. & Brugnara, Y. A monthly global paleo-reanalysis of the atmosphere from 1600 to 2005 for studying past climatic variations. Scientific Data 4, 170076, https://doi.org/10.1038/sdata.2017.76 (2017).
Article PubMed PubMed Central Google Scholar
Steiger, N. J., Smerdon, J. E., Cook, E. R. & Cook, B. I. A reconstruction of global hydroclimate and dynamical variables over the Common Era. Scientific data 5, 1–15, https://doi.org/10.1038/sdata.2018.86 (2018).
Article Google Scholar
Neukom, R., Steiger, N., Gómez-Navarro, J. J., Wang, J. & Werner, J. P. No evidence for globally coherent warm and cold periods over the preindustrial Common Era. Nature 571, 550–554, https://doi.org/10.1038/s41586-019-1401-2 (2019).
Article ADS CAS PubMed Google Scholar
Valler, V., Franke, J., Brugnara, Y. & Brönnimann, S. An updated global atmospheric paleo-reanalysis covering the last 400 years. Geosc. Data J. 1–19, https://doi.org/10.1002/gdj3.121 (2021).
Emile-Geay, J. et al. A global multiproxy database for temperature reconstructions of the Common Era. Scientific data 4, 170088, https://doi.org/10.1038/sdata.2017.88 (2017).
Article Google Scholar
Burgdorf, A.-M. et al. DOCU-CLIM: A global documentary climate dataset for climate reconstructions. Scientific data 10, 402, https://doi.org/10.1038/s41597-023-02303-y (2023).
Article PubMed PubMed Central Google Scholar
Tardif, R. et al. Last Millennium Reanalysis with an expanded proxy database and seasonal proxy modeling. Climate of the Past 15, 1251–1273, https://doi.org/10.5194/cp-15-1251-2019 (2019).
Article ADS Google Scholar
Whitaker, J. S. & Hamill, T. M. Ensemble data assimilation without perturbed observations. Monthly Weather Review 130, 1913–1924, 10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2 (2002).
Hand, R., Samakinwa, E., Lipfert, L. & Brönnimann, S. Mode-sim–a medium-sized atmospheric general circulation model (agcm) ensemble to study climate variability during the modern era (1420 to 2009). Geoscientific Model Development 16, 4853–4866, https://doi.org/10.5194/gmd-16-4853-2023 (2023).
Article ADS Google Scholar
Esper, J., Cook, E. R. & Schweingruber, F. H. Low-frequency signals in long tree-ring chronologies for reconstructing past temperature variability. Science 295, 2250–2253, https://doi.org/10.1126/science.1066208 (2002).
Article ADS CAS PubMed Google Scholar
Goosse, H. et al. Reconstructing surface temperature changes over the past 600 years using climate model simulations with data assimilation. Journal of Geophysical Research: Atmospheres 115, https://doi.org/10.1029/2009JD012737 (2010).
Bhend, J., Franke, J., Folini, D., Wild, M. & Brönnimann, S. An ensemble-based approach to climate reconstructions. Climate of the Past 8, 963–976, https://doi.org/10.5194/cp-8-963-2012 (2012).
Article ADS Google Scholar
Schneider, U., Becker, A., Finger, P., Rustemeier, E. & Ziese, M. GPCC Full Data Monthly Product Version 2020 at 0.25°: Monthly Land-Surface Precipitation from Rain-Gauges Built on GTS-Based and Historical Data. Global Precipitation Climatology Centre (GPCC) at Deutscher Wetterdienst https://doi.org/10.5676/DWD_GPCC/FD_M_V2022_025 (2020).
Matsikaris, A., Widmann, M. & Jungclaus, J. On-line and off-line data assimilation in palaeoclimatology: a case study. Climate of the Past 11, 81–93, https://doi.org/10.5194/cp-11-81-2015 (2015).
Article ADS Google Scholar
Okazaki, A., Miyoshi, T., Yoshimura, K., Greybush, S. J. & Zhang, F. Revisiting online and offline data assimilation comparison for paleoclimate reconstruction: an idealized OSSE study. Journal of Geophysical Research: Atmospheres 126, e2020JD034214, https://doi.org/10.1029/2020JD034214 (2021).
Article ADS Google Scholar
Evensen, G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans 99, 10143–10162, https://doi.org/10.1029/94JC00572 (1994).
Article Google Scholar
Houtekamer, P. L. & Mitchell, H. L. Data assimilation using an ensemble Kalman filter technique. Monthly Weather Review 126, 796–811, 10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2 (1998).
Article ADS Google Scholar
Anderson, J. L. An ensemble adjustment Kalman filter for data assimilation. Monthly weather review 129, 2884–2903, 10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2 (2001).
Article ADS Google Scholar
Bishop, C. H., Etherton, B. J. & Majumdar, S. J. Adaptive sampling with the ensemble transform Kalman filter. part I: Theoretical aspects. Monthly weather review 129, 420–436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2 (2001).
Article ADS Google Scholar
Brugnara, Y. et al. A collection of sub-daily pressure and temperature observations for the early instrumental period with a focus on the” year without a summer” 1816. Climate of the Past 11, 1027–1047, https://doi.org/10.5194/cp-11-1027-2015 (2015).
Article ADS Google Scholar
Kent, E. C. & Kennedy, J. J. Historical estimates of surface marine temperatures. Annual Review of Marine Science 13, 283–311, https://doi.org/10.1146/annurev-marine-042120-111807 (2021).
Article ADS PubMed Google Scholar
Brugnara, Y., Hari, C., Pfister, L., Valler, V. & Brönnimann, S. Pre-industrial temperature variability on the Swiss Plateau derived from the instrumental daily series of Bern and Zurich. Climate of the Past 18, 2357–2379, https://doi.org/10.5194/cp-18-2357-2022 (2022).
Article ADS Google Scholar
Samakinwa, E. et al. An ensemble reconstruction of global monthly sea surface temperature and sea ice concentration 1000–1849. Scientific data 8, 1–16, https://doi.org/10.1038/s41597-021-01043-1 (2021).
Article Google Scholar
Titchner, H. A. & Rayner, N. A. The Met Office Hadley centre sea ice and sea surface temperature data set, version 2: 1. Sea ice concentrations. Journal of Geophysical Research: Atmospheres 119, 2864–2889, https://doi.org/10.1002/2013JD020316 (2014).
Article ADS Google Scholar
Jungclaus, J. H. et al. The PMIP4 contribution to CMIP6–Part 3: The last millennium, scientific objective, and experimental design for the PMIP4 past1000 simulations. Geoscientific Model Development 10, 4005–4033, https://doi.org/10.5194/gmd-10-4005-2017 (2017).
Article ADS Google Scholar
Giorgetta, M. A. et al. Climate and carbon cycle changes from 1850 to 2100 in MPI-ESM simulations for the Coupled Model Intercomparison Project phase 5. Journal of Advances in Modeling Earth Systems 5, 572–597, https://doi.org/10.1002/jame.20038 (2013).
Article ADS Google Scholar
McGregor, H. V. et al. Robust global ocean cooling trend for the pre-industrial Common Era. Nature Geoscience 8, 671–677, https://doi.org/10.1038/ngeo2510 (2015).
Article ADS CAS Google Scholar
Tierney, J. E. et al. Tropical sea surface temperatures for the past four centuries reconstructed from coral archives. paleoceanography 30, 226–252, https://doi.org/10.1002/2014PA002717 (2015).
Article ADS Google Scholar
Konecky, B. L. et al. The Iso2k database: a global compilation of paleo-δ¹⁸O and δ²H records to aid understanding of Common Era climate. Earth System Science Data 12, 2261–2288, https://doi.org/10.5194/essd-12-2261-2020 (2020).
Article ADS Google Scholar
Wilson, R. et al. Last millennium northern hemisphere summer temperatures from tree rings: Part I: The long term context. Quaternary Science Reviews 134, 1–18, https://doi.org/10.1016/j.quascirev.2015.12.005 (2016).
Article ADS Google Scholar
Briffa, K. R. et al. Low-frequency temperature variations from a northern tree ring density network. Journal of Geophysical Research: Atmospheres 106, 2929–2941, https://doi.org/10.1029/2000JD900617 (2001).
Article Google Scholar
Breitenmoser, P. D., Brönnimann, S. & Frank, D. Forward modelling of tree-ring width and comparison with a global network of tree-ring chronologies. Climate of the Past 10, 437–449, https://doi.org/10.5194/cp-10-437-2014 (2014).
Article ADS Google Scholar
Neukom, R. et al. Inter-hemispheric temperature variability over the past millennium. Nature Climate Change 4, 362, https://doi.org/10.1038/nclimate2174 (2014).
Article ADS Google Scholar
Rutherford, S. et al. Proxy-based Northern Hemisphere surface temperature reconstructions: Sensitivity to method, predictor network, target season, and target domain. Journal of Climate 18, 2308–2329, https://doi.org/10.1175/JCLI3351.1 (2005).
Article ADS Google Scholar
Briffa, K. R., Jones, P. D., Schweingruber, F. H., Karlén, W. & Shiyatov, S. G. Tree-ring variables as proxy-climate indicators: problems with low-frequency signals. In Climatic variations and forcing mechanisms of the last 2000 years, 9–41, https://doi.org/10.1007/978-3-642-61113-1_2 (Springer-Verlag Berlin Heidelberg, 1996).
Esper, J., Cook, E. R., Krusic, P. J., Peters, K. & Schweingruber, F. H. Tests of the RCS method for preserving low-frequency variability in long tree-ring chronologies. Tree-Ring Research 59, 81–98 (2003).
Google Scholar
Franke, J., Frank, D., Raible, C. C., Esper, J. & Brönnimann, S. Spectral biases in tree-ring climate proxies. Nature Climate Change 3, 360–364, https://doi.org/10.1038/NCLIMATE1816 (2013).
Article ADS Google Scholar
Morice, C. P. et al. An updated assessment of near-surface temperature change from 1850: the HadCRUT5 data set. Journal of Geophysical Research: Atmospheres 126, e2019JD032361, https://doi.org/10.1029/2019JD032361 (2021).
Article ADS Google Scholar
Rustemeier, E., Becker, A., Finger, P., Schneider, U. & Ziese, M. GPCC Climatology Version 2020 at 2.5°: Monthly Land-Surface Precipitation Climatology for Every Month and the Total Year from Rain-Gauges built on GTS-based and Historical Data, https://doi.org/10.5676/DWD_GPCC/CLIM_M_V2020_250 (2020).
Luterbacher, J. et al. Reconstruction of sea level pressure fields over the Eastern North Atlantic and Europe back to 1500. Climate Dynamics 18, 545–561, https://doi.org/10.1007/s00382-001-0196-6 (2002).
Article ADS Google Scholar
Luterbacher, J., Dietrich, D., Xoplaki, E., Grosjean, M. & Wanner, H. European seasonal and annual temperature variability, trends, and extremes since 1500. Science 303, 1499–1503, https://doi.org/10.1126/science.1093877 (2004).
Article ADS CAS PubMed Google Scholar
Pauling, A., Luterbacher, J., Casty, C. & Wanner, H. Five hundred years of gridded high-resolution precipitation reconstructions over Europe and the connection to large-scale circulation. Climate dynamics 26, 387–405, https://doi.org/10.1007/s00382-005-0090-8 (2006).
Article ADS Google Scholar
Burgdorf, A.-M. A global inventory of quantitative documentary evidence related to climate since the 15th century. Climate of the Past 18, 1407–1428, https://doi.org/10.5194/cp-18-1407-2022 (2022).
Article ADS Google Scholar
Labbé, T. et al. The longest homogeneous series of grape harvest dates, Beaune 1354–2018, and its significance for the understanding of past and present climate. Climate of the Past 15, 1485–1501, https://doi.org/10.5194/cp-15-1485-2019 (2019).
Article ADS Google Scholar
Reichen, L. et al. A decade of cold Eurasian winters reconstructed for the early 19th century. Nature Communications 13, 2116, https://doi.org/10.1038/s41467-022-29677-8 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Lundstad, E. et al. The global historical climate database HCLIM. Scientific Data 10, 44, https://doi.org/10.1038/s41597-022-01919-w (2023).
Article PubMed PubMed Central Google Scholar
Wartenburger, R., Brönnimann, S. & Stickler, A. Observation errors in early historical upper-air observations. Journal of Geophysical Research: Atmospheres 118, 12–012, https://doi.org/10.1002/2013JD020156 (2013).
Article Google Scholar
Pappert, D. et al. Unlocking weather observations from the Societas Meteorologica Palatina (1781–1792. Climate of the Past 17, 2361–2379, https://doi.org/10.5194/cp-17-2361-2021 (2021).
Article ADS Google Scholar
Freeman, E. et al. ICOADS Release 3.0: a major update to the historical marine climate record. International Journal of Climatology 37, 2211–2232, https://doi.org/10.1002/joc.4775 (2017).
Article ADS Google Scholar
Valler, V., Franke, J. & Brönnimann, S. Impact of different estimations of the background-error covariance matrix on climate reconstructions based on data assimilation. Climate of the Past 15, 1427–1441, https://doi.org/10.5194/cp-15-1427-2019 (2019).
Article ADS Google Scholar
Kilbourne, H. et al. A global multiproxy database for temperature reconstructions of the Common Era. figshare. Collection https://doi.org/10.6084/m9.figshare.c.3285353.v2 (2017).
McGregor, H. et al. NOAA/WDS Paleoclimatology - PAGES Ocean2k Synthesis Data Set, National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI), https://doi.org/10.25921/bba3-4t43 (2015).
Konecky, B. & McKay, N. NOAA/WDS Paleoclimatology - Iso2k database global Common Era paleo-d18O and d2H records (iso2k1_0_0,2020) https://doi.org/10.25921/57j8-vs18 (2020).
Wilson, R. et al. N-TREND: Northern Hemisphere tree-ring network development https://ntrenddendro.wordpress.com/ (2015).
Rutherford, S. et al. Proxy-based Northern Hemisphere surface temperature reconstructions: Sensitivity to method, predictor network, target season, and target domain, https://crudata.uea.ac.uk/~timo/datapages/mxdtrw.htm#refs (2005).
Max Planck Institute for Meteorology. Input for EKF400: Original ECHAM simulations (CCC400) and assimilated observations, http://cera-www.dkrz.de/WDCC/ui/Compact.jsp?acronym=EKF400_Input_Data (2017).
Brönnimann, S. & Burgdorf, A.-M. DOCU-CLIM: Collection of global documentary climate data spanning 1400 to 2000, https://doi.org/10.48620/167 (2023).
Lundstad, E., Brugnara, Y. & Brönnimann, S. Global early instrumental monthly meteorological multivariable database (HCLIM), https://doi.org/10.1594/PANGAEA.940724 (2022).
Slivinski, L. C. et al. Towards a more reliable historical reanalysis: Improvements for version 3 of the Twentieth Century Reanalysis system. Quarterly Journal of the Royal Meteorological Society 145, 2876–2908, https://doi.org/10.1002/qj.3598 (2019).
Article ADS Google Scholar
Rustemeier, E., Hänsel, S., Finger, P., Schneider, U. & Ziese, M. GPCC Climatology Version 2022 at 0.25°: Monthly land-surface precipitation climatology for every month and the total year from rain-gauges built on GTS-based and historical data, https://doi.org/10.5676/DWD_GPCC/CLIM_M_V2022_025 (2022).
Allan, R. & Ansell, T. A new globally complete monthly historical gridded mean sea level pressure dataset (HadSLP2): 1850–2004. Journal of Climate 19, 5816–5842, https://doi.org/10.1175/JCLI3937.1 (2006).
Article ADS Google Scholar
Kennedy, J. J., Rayner, N., Atkinson, C. & Killick, R. An ensemble data set of sea surface temperature change from 1850: The Met Office Hadley Centre HadSST. 4.0. 0.0 data set. Journal of Geophysical Research: Atmospheres 124, 7719–7763, https://doi.org/10.1029/2018JD029867 (2019).
Article ADS Google Scholar
Palmer, W. Meteorological drought. us department of commerce weather bureau research paper 45, 58 pp (1965).
Thornthwaite, C. W. An approach toward a rational classification of climate. Geographical review 38, 55–94, https://doi.org/10.2307/210739 (1948).
Article ADS Google Scholar
Wells, N., Goddard, S. & Hayes, M. J. A self-calibrating Palmer drought severity index. Journal of climate 17, 2335–2351, 10.1175/1520-0442(2004)017<2335:ASPDSI>2.0.CO;2 (2004).
Cook, E. R. et al. Old World megadroughts and pluvials during the Common Era. Science advances 1, 1–9, https://doi.org/10.1126/sciadv.1500561 (2015).
Article Google Scholar
Wetter, O. et al. The year-long unprecedented European heat and drought of 1540–a worst case. Climatic change 125, 349–363, https://doi.org/10.1007/s10584-014-1184-2 (2014).
Article ADS Google Scholar
Valler, V. et al. ModE-RA - a global monthly paleo-reanalysis of the modern era (1421 to 2008): Set 1420-3_1850-1, https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501 (2023).
Hand, R., Brönnimann, S., Samakinwa, E. & Lipfert, L. ModE-Sim - a medium size AGCM ensemble to study climate variability during the modern era (1420 to 2009): Set 1420-3, https://doi.org/10.26050/WDCC/ModE-Sim_s14203 (2023).

Download references

Acknowledgements

This study was funded by the European Union H2020/ERC grant number 787574 PALAEO-RA. ARF was funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant No. 894064 (AQUATIC). The ECHAM6 simulations were performed at the Swiss Supercomputer Centre (CSCS).

Author information

These authors contributed equally: Veronika Valler, Jörg Franke.

Authors and Affiliations

Oeschger Centre for Climate Change Research, University of Bern, Bern, Switzerland
Veronika Valler, Jörg Franke, Yuri Brugnara, Eric Samakinwa, Ralf Hand, Elin Lundstad, Angela-Maria Burgdorf, Laura Lipfert, Andrew Ronald Friedman & Stefan Brönnimann
Institute of Geography, University of Bern, Bern, Switzerland
Veronika Valler, Jörg Franke, Yuri Brugnara, Eric Samakinwa, Ralf Hand, Elin Lundstad, Angela-Maria Burgdorf, Laura Lipfert, Andrew Ronald Friedman & Stefan Brönnimann

Authors

Veronika Valler
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Franke
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Brugnara
View author publications
You can also search for this author in PubMed Google Scholar
Eric Samakinwa
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Hand
View author publications
You can also search for this author in PubMed Google Scholar
Elin Lundstad
View author publications
You can also search for this author in PubMed Google Scholar
Angela-Maria Burgdorf
View author publications
You can also search for this author in PubMed Google Scholar
Laura Lipfert
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Ronald Friedman
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Brönnimann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B. conceived the experiments design, V.V. conducted the experiments, V.V. and J.F. analyzed the results and wrote the major part of the manuscript. S.B., Y.B., J.F., R.H. and V.V. prepared the data for the assimilation. V.V. and J.F. developed the figures and tables. All authors discussed the results, helped with the quality control of the final dataset and reviewed the manuscript.

Corresponding author

Correspondence to Jörg Franke.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Valler, V., Franke, J., Brugnara, Y. et al. ModE-RA: a global monthly paleo-reanalysis of the modern era 1421 to 2008. Sci Data 11, 36 (2024). https://doi.org/10.1038/s41597-023-02733-8

Download citation

Received: 04 May 2023
Accepted: 08 November 2023
Published: 05 January 2024
DOI: https://doi.org/10.1038/s41597-023-02733-8