Abstract
The Modern Era Reanalysis (ModE-RA) is a global monthly paleo-reanalysis covering the period between 1421 and 2008. To reconstruct past climate fields an offline data assimilation approach is used, blending together information from an ensemble of transient atmospheric model simulations and observations. In the early period, ModE-RA utilizes natural proxies and documentary data, while from the 17th century onward instrumental measurements are also assimilated. The impact of each observation on the reconstruction is stored in the observation feedback archive, which provides additional information on the input data such as preprocessing steps and the regression-based forward models. The monthly resolved reconstructions include estimates of the most important climate fields. Furthermore, we provide a reconstruction, ModE-RAclim, which together with ModE-RA and the model simulations allows to disentangle the role of observations and model forcings. ModE-RA is best suited to study intra-annual to multi-decadal climate variability and to analyze the causes and mechanisms of past extreme climate events.
Similar content being viewed by others
Background & Summary
Analyzing longer time periods than what is covered by modern instruments helps to gain further insights into the climate system, its variability and understanding of historical climate events. However, to study the period before the availability of state-of-the-art instrumental measurements, e.g., to analyze the intra-annual dynamics of past climate changes, datasets with a high temporal resolution are required. Thanks to various initiatives, several new climate reconstructions of the Common Era (the past 2000 years)1,2,3,4,5, most of them with annual resolution, have become available along with data compilations6,7, and model simulations. For the last 600 years, there exist enough high-resolution data – natural proxies (e.g., tree rings, coral, ice cores), documentary evidence, and since 1658 early instrumental measurements – to attempt to generate a monthly global climate reconstruction.
To access climate variations in the past we can use modeling results and observations, but neither of the two sources can fully capture past climate variability. On the one hand, model simulations can only give a possible range of past climate states but the specific realization of random, internal variability that occurred in the real world cannot be reproduced. On the other hand, observational data become sparser in space the earlier the studied period is, in line with decreasing temporal resolution. Many recent global climate reconstructions use data assimilation to provide a best estimate (so-called analysis) of past climate fields by combining model simulations and observations1,2,3,5,8. In this study, we employ an ensemble-based Kalman filter based data assimilation approach9 and use the newest data compilations with a new ensemble of atmospheric general circulation model simulations10 to derive monthly global atmospheric paleo-reanalyses from 1421 to 2008.
Previous monthly climate reconstructions generated by our group based on a similar data assimilation technique date back to 16012,5. Compared to these reconstructions, the amount of assimilated input data in this study largely exceeds the data used in previous products, we improved our data treatment strategy, and added a new method to assimilate short time series. Between 1421 and 1657 the observational network is exclusively based on proxy records and documentary evidence. The first long instrumental measurement series become available in Europe starting in 1658. With time the observational network becomes more and more dense, reaching a peak in the 1880s and 1890s, because after 1890 we do not add any new series to the network and many of these stations stopped measuring sometime in the 20th century. The enlarged input database enables us to produce more robust reconstructions, for example over the winter seasons in the 17th and 18th centuries in the Northern Hemisphere due to the incorporation of documentary evidence and early instrumental data. The assimilated observations (temperature-sensitive and/or precipitation-sensitive proxy records, documentary evidences and early instrumental measurements over land and marine areas) have indirect or direct information about past temperature, pressure, precipitation, and wind. In addition to these variables, we reconstruct other climate fields of interest such as geopotential height and wind components on several vertical levels. Moreover, we provide a valuable observation feedback archive to be able to trace back the effect of the observations and at the same time to publish the input data. In order to quantify the effect of forced variability in the simulation on the reanalysis, we performed two reconstructions: one using an ensemble of transient model simulations as prior (ModE-RA), and one sensitivity experiment with random stationary priors (ModE-RAclim) (Table 1). A schematic overview of the implemented data assimilation system of ModE-RA is shown on Fig. 1.
ModE-RA is systematically evaluated by comparisons with independent time series, gridded instrumental observations, and other climate reconstructions. Most paleo-reanalyses are based on non-transient priors from climate simulations. Their low-frequency variability is derived from the assimilated empirical data. In contrast, the centennial-scale variability in ModE-RA originates from the model response to forcings; therefore this dataset is best suited to study intra-annual to multi-decadal variability. Observations were transformed to anomalies from 71-year running means to reduce trends due to the changing input data network through time and because centennial-scale climate variability is not consistently preserved in many of the assimilated proxy records, for instance due to an inconsistent age trends removal in tree-ring width11. ModE-RA shows good agreement with 20th century gridded monthly products, for temperature (globally), sea-level pressure (especially in the Northern Hemisphere) and precipitation (in the regions where precipitation observations were assimilated). Comparing ModE-RA with gridded annual temperature reconstructions, we find the highest correlations with the summer mean (JJA) of ModE-RA. Evaluation against independent documentary evidence shows very similar values to the calibration results indicating a good performance of ModE-RA.
The overall goal of ModE-RA is to provide a multi-variate dataset that combines all available direct and indirect climate observations, with reconstructions of external forcings and the physics represented in climate models. In future studies, ModE-RA can be used to assess climate variability, historical climate events like volcanic eruptions and to examine past extreme events, such as several month long droughts, which may have no counterpart in the modern instrumental period.
Methods
Offline data assimilation
Data assimilation (DA) has been applied in multiple paleoclimatogical studies1,2,3,12, making it possible to use sparse data with a continuously-varying observational network and, at the same time, to produce realistic climate fields in accordance with the model physics. Among the various DA techniques, ensemble-based Kalman filter methods have particularly been used in paleoclimate reconstructions1,2,13,14. Most of the reconstructions of past climate are built on already-existing model simulations and do not propagate the analysis forward in time. This technique is known as offline DA. It has been previously argued that in the case of paleoclimate reconstruction, the predictability of the system is shorter than the temporal resolution of the observations, and no benefit was found from propagating the analysis forward if the focus is on land regions13,15. A recent study16 showed that online DA can outperform offline DA when it is coupled with an ocean model. However, applying an offline DA method is computationally much cheaper than the traditional online techniques, and it allows for easy testing. Thus, we keep working with the offline approach.
Various ensemble-based Kalman filter approaches have been developed over the past few decades17,18,19,20. We use the offline variant of the ensemble square root filter (EnSRF) DA approach9 to generate a 600-year-long global monthly atmospheric paleo-reanalysis (ModE-RA). Here, we briefly describe the update part of the EnSRF, in which the observation information is optimally combined with the model simulations. The EnSRF, like other ensemble-based approaches, uses the ensemble statistics to specify the Kalman filter equations. The update step of the EnSRF is divided into an update of the ensemble mean (\(\bar{x}\)) and an update of the deviations from the ensemble mean (\({x}_{i}^{{\prime} }={x}_{i}-\bar{x}\)):
The background state vector (xb) represents the state of the atmosphere before the assimilation of empirical data. In the case of offline DA, xb is obtained from existing model simulations. xa is the analysis, the updated state where observation and model information have been merged. H is the forward operator, which maps the model values to the observed data. K and \(\widetilde{K}\) represent the Kalman gain matrix and the reduced Kalman gain matrix, which are calculated as:
Pb is the background-error covariance matrix, and R is the observation-error covariance matrix. Although it is challenging to quantify the errors both in the model simulations and in the observations, it is fairly important to estimate them accurately, since the update of the state vector depends on their relation. If the observational error variance is smaller than the model error variance, then the state vector is more strongly shifted towards the observation value; otherwise, the observation will have little impact on adjusting the model states. In ensemble-based approaches, the background-error covariances are calculated from the ensemble members17. No observations are free from errors. Instrumental observations have various error sources21,22,23, and estimating the error in documentary and proxy data is not straightforward either. How the errors of the different observation types are estimated is discussed in the Observational data section below. It is assumed that observational errors are uncorrelated. Hence, observational data can be assimilated one by one9, which further simplifies the assimilation scheme. Previous studies have shown that spurious long-range covariances can appear in the estimated Pb due to the small ensemble size. To avoid updates in the climate fields from distant observations, spatial localization is employed. Localizing Pb will limit how the observational information is distributed to all grid points as well as to the other variables in the state vector. The exact implementation of the offline EnSRF is given later in the Experimental design section.
Atmospheric model simulations
As an a priori state, we use ModE-Sim, an ensemble of simulations using the atmospheric general circulation model ECHAM610. ECHAM6 was run at T63 spectral horizontal resolution (approximately 1.8° by 1.8°). ModE-Sim is designed to sample internal variability under given boundary conditions and radiative forcings while accounting for uncertainties in these. The ensemble consists of different transient monthly-varying forcings and boundary conditions that account for uncertainties in their reconstruction10. For the period from 1421 to 1850, ModE-RA is based on ModE-Sim set 1420-3, consisting of 20 members that use sea-surface temperature (SST) reconstructions24 and climatological HadISST sea ice25 as boundary conditions, and PMIP4 radiative forcings26. For the period from 1851 to 2008, ModE-RA is based on ModE-Sim set 1850-1, which consists also of 20 members forced with HadISST2 SSTs and sea ice conditions and PMIP4 standard forcings. Concerning the mean state, ModE-Sim has the typical biases of ECHAM6 in stand-alone mode10,27. The most prominent difference compared to observations is a warm bias in the Northern Hemisphere mid-latitudes. Another warm bias exists over Australia, and cold biases over South America, India, and the northern Rocky Mountains. Furthermore, the model is too wet over the Himalayas and the Andes. However, despite its mean state biases, ModE-Sim is able to capture the internal variability of our key variables over land regions10. The performance is particularly good on seasonal-to-interannual timescales with the limitation that the ensemble spread is slightly too large in many regions10.
The 20 ensemble members were preprocessed by calculating the ensemble mean from the 20 ensemble members, which in the next step was transformed to 71-year monthly running climatologies (35 years before and 35 years after the current year when possible). These climatologies were then subtracted from each member to obtain the 71-year running anomalies in agreement with the assimilated anomalies. These anomalies form the xb state vector in the DA framework.
Observational data
In order to get the most complete information about the Earth’s climate in the last 600 years from observations, we utilize several data sources such as proxy records, documentary data, and instrumental measurements. Prior to the widespread availability of surface instrumental measurements, past climate changes can be reconstructed from proxies and documentary data. Natural proxy records are available for the full reconstruction period and instrumental measurements exist since the mid-17th century. The temporal availability and spatial distribution of the different data types are shown in Figs. 2, 3.
Natural proxy records
Paleoclimatic proxy records provide the earliest information for our climate reconstruction from many regions of the world. Particularly useful for our focus on intra-annual to multi-decadal variability are seasonally-to-annually resolved records. The proxy network used in this study consists of several previous data compilations: PAGES2k data base6, OCEAN2k data base28,29, ISO2k data base30, N-TREND31, Briffa/Schweingruber32, Breitenmoser 201433, and Neukom 201434; and involves the assimilation of bivalve, coral, ice, lake sediment, speleothem, and tree proxy records. The PAGES2k data base contains 692 temperature-sensitive records; the major archive type is trees with ring width (TRW) as the most common variable. The OCEAN2k data base is a collection of 57 coral proxies that recorded variations in sea surface temperature. The ISO2k data base includes 759 δ18O and δ18H isotope records both from continental and marine regions. The N-TREND data base contains 54 tree-ring records from the Northern Hemisphere. The Briffa/Schweingruber dataset is based on 387 tree-ring Maximum Latewood Density (MXD) chronologies; however, we use the version which has been gridded to 5° × 5° in longitude and latitude and consists of 115 grid boxes35. The Breitenmoser 2014 dataset contains 2287 consistently-processed chronologies from the International Tree Ring Data Base. The Neukom 2014 dataset includes 48 marine and 277 terrestrial paleoclimatic records from the Southern Hemisphere.
In our proxy network, most records for the last millennium come from tree rings, which have one time-integrated observation per year, primarily representing the growing season. To incorporate time-integrated information into our monthly climate reconstruction, we choose a half-yearly assimilation window. We use boreal winter (October of year-1 to March), which is the Southern Hemisphere growing season, and boreal summer (April to September), which is the Northern Hemisphere growing season. The other proxy types with annual resolution have been adapted to best fit into this scheme. For instance, ice cores may record a signal during the season of strongest precipitation. Hence, we test the proxy against the climate signal from both seasons. As a result, such non-tree ring proxies can be recorders of none, one, or both seasons.
As mentioned above, we assimilate anomalies. In the first step, all proxy data are high-pass filtered by transforming them to 71-year running anomalies, i.e., each anomaly is calculated with respect to the 35 years before and 35 years after the current year when possible. This transformation is required because low-frequency variability is not retained in many TRW records whose initial goals have not been to reconstruct climate36,37,38. In the following step, all proxy records are standardized by subtracting their mean and dividing by their standard deviation. Finally, we create multiple linear regression models for each proxy, which serve as a forward operator H in the assimilation. The regression coefficients are estimated based on at least 30 years during the period 1901 to 1970 using the HadCRUT539 and GPCC v202040 gridded instrumental datasets for temperature and precipitation, respectively. Each month of the growing season serves as an independent variable in the regression model. We only allow for consecutive months to influence the proxies, e.g., tree-growth can depend on May-June precipitation and June-July-August temperature but not on June and August temperature without July. Based on the given information in the corresponding datasets’ publications, we apply different H forward models. See Table 2 for specific details on the proxy forward models. We test 15 temperature-only models and/or 15 precipitation-only models and/or 225 mixed (both temperature and precipitation) models per proxy record. That adds up to a theoretical maximum of 255 models (combinations of months). From these, we identify the best model based on the Akaike Information Criterion (AIC). The error of the proxy record is estimated from the regression model residuals.
The records in our proxy network furthermore must have a significant climate signal. Therefore, we require the p-value of the F-Test for the regression model to be statistically significant (p < 0.05) and set a threshold for R2 > 0.1. Additionally, we have visually inspected the time series of all proxy data and the match of the forward modeled proxy data. In this procedure, we identified and excluded a few records that only exhibit multi-decadal variability despite having annual values. In the final step, duplicates have been removed since proxy records can be part of several databases. Sometimes duplicates may not have identical values because there are multiple versions of the same record, for instance, after different statistical treatments like age-trend removal. Therefore, we keep the best (highest R2) and longest record within a 10 km radius.
Documentary data
Documentary data encompass climate information from historical documents. We use them in the form of indices (temperature, precipitation, wetness/dryness) or phenological data (flowering dates, harvest dates, river or lake freezing and thawing). Documentary data have been used for regional climate reconstructions41,42,43, but only a few records have so far been included in global datasets such as the PAGES2k data base6. A new global documentary data collection was published recently, combining document-based climate evidence from all around the world for the past 600 years, comprising ca. 600 series7. Documentary data have a unique importance in climate reconstruction because they have higher temporal resolution than natural proxy records, often monthly, and they cover combinations of seasons and regions (e.g., winter in East Asia) that are otherwise not well covered with natural proxy data. From the ca. 600 available time series in the global documentary data collection, we removed those that did not extend further back than the neighboring meteorological station that would have to be used for calibration. We assimilate phenological data such as flowering date, grape harvest date, leaf coloring date, and freezing or thawing dates of rivers or lakes (or related information such as the first and last ship in a port); specific indices such as sea ice severity, temperature or precipitation indices, and wetness or dryness indices; as well as indices that were already converted to a temperature or precipitation series. Moreover, we used various wind proxies based on marine observations of wind direction. Some of the documentary series in Europe and Asia reach back to the year 1421, to 1722 (or 1640 if we consider Guatemala rainfall index) in North America, to 1540 in South America, and to 1796 (or 1750 if we consider West African Monsoon Index) in Africa.
In order to assimilate documentary data, a forward operator (H) needs to be defined. Like for the natural proxies, this was done by calibrating a multiple linear regression model based on gridded climate data or station data and is described in detail in the documentary data collection paper7,44. All necessary details on the forward model are given in the observation feedback archive files. In short, we use monthly data of the main driving variable (either temperature or precipitation). If an index covers a fixed season or month, this season (seasonal average) or month was chosen, and the forward operator then simply extracts this month or season from the state vector. For phenological dates, we also allowed months prior to the event in question to enter the model. We therefore included all previous months back to the start of the assimilation window. In a backward selection approach, we then retained only significant months (p < 0.1) but kept insignificant months between two significant months to obtain a consistent model. That differs from Burgdorf et al. (2023) only insofar that in the latter paper, all six months prior to a phenological event (defined as the month in which the 90th percentile of the time series lies) were included, whereas in our approach this was restricted to the months within the six-month assimilation window of ModE-RA. As a consequence, there are a few instances in which a series (e.g., spring thawing date) appears both in the boreal winter and in the boreal summer assimilation, as both models individually are significant. Another difference was that all models were calibrated based on anomalies from the 71-year moving average, as they will be assimilated in this form. In a few cases, highly non-Gaussian variables were transformed prior to calibration7,45,46.
The models were calibrated in an overlapping period that was ideally 70 years long and as early as possible, but we allowed for shorter periods down to 20 years. If the overlap was too short, we standardized the documentary record and scaled it with the standard deviation of the climate field data. The regression coefficients of the final model were then used as coefficients of the forward operator, and the variances of the residuals of the regression were used as an estimation of the observational error.
Instrumental measurements on land
Early meteorological measurements have a key role when studying past climate variations. A new comprehensive data compilation (HCLIM), with the focus on the early instrumental period, includes global, regional, and national databases as well as various other datasets and newly digitalized time series47. The HCLIM database contains early instrumental time series of global air temperature measurements (3633 series) between 1658–2021, monthly global atmospheric pressure measurements (807) between 1658–2020, monthly global precipitation measurements (4944) between 1677–2021, and monthly number of wet days (3072) between 1586–2019.
The records included in the HCLIM database are quality controlled but not homogenized. However, breakpoints and merging information are provided. We used the breakpoints to prepare the instrumental measurements for the assimilation. The time series were split where breakpoints are indicated and then divided into three categories: (1) long time series, more than 50 years long, (2) medium length series, between 5 and 50 years; and (3) short time series, less than 5 years long. Therefore, while HCLIM only comprises time series starting before 1890, some of the time series resulting from the splitting will have a later starting date. We chose a simple H forward operator to assimilate the measurements, which only extracts the correct month from the state vector. In addition to the forward operator, the errors in the measurements also need to be provided. We estimated the errors following the methodology of Wartenburger et al.48. In some cases, the error had to be inflated, in particular when the monthly means were calculated as the average of the monthly extremes and when they referred to the Julian calendar. For these cases, we used nine sub-daily series from the Palatine Society network49 to estimate the inflation. The estimated errors of the measurements are summarized in Table 3. The long time series are transformed to anomalies with respect to 71-year moving climatologies, similar to the proxy records and documentary data, while the medium-length and short series are first left unchanged. Furthermore, if several series of the same instrumental type belong to the same grid cell in the same year, only the ones belonging to the longest category are assimilated.
Pressure measurements from ships
Marine near-surface measurements provide information from many regions where otherwise we have no other data sources. We use the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3.0 to assimilate pressure measurements50. Nighttime marine air temperature and SST measurements have been already incorporated into the SST reconstruction24 used as boundary condition in ModE-Sim10. ICOADS is a 2° × 2° gridded dataset with monthly resolution starting in 1800. In addition to the monthly values, important metadata (e.g., number of observations per month) are also available. We use the number of observations per month to decide whether the given monthly value per grid cell is a good estimation of monthly means. The number of days with observations can come from multiple ships crossing the same grid box on the same days. However, in the earlier times, it is very unlikely that many ships crossed a grid box on the same day of the month. We set 10 observations per month as a minimum number of observations to use the data as a monthly mean. Additionally, we mask all grid cells that never met the 10 observations per month criterion at any time step until 1890. That leaves us with wide coverage because many grid cells have some observations during the 1880s. Moreover, we further thinned the amount of marine pressure measurements due to the correlated errors in the case of moving ships. In a raster of 3 × 3 grid cells, we only keep the observation at the location where the number of observations per month is the highest.
The H forward operator follows the same principle as for land measurements; that is, extracting the closest grid cell for a given month from the state vector. Observational error variances are estimated to vary between 3 hPa, which has been used for land stations in case of hundreds of observations per month and 10 hPa in the case of only 10 observations per month. We used an equation similar to what has been used previously for estimating observational error of ship measurements24:
where, the systematic error variance (\({\sigma }_{systematic}^{2}\)) is approximated with 3 hPa, the random error variance (\({\sigma }_{random}^{2}\)) with 22 hPa and n denotes the number of observations. We could not calculate the 71-yr moving climatologies because of the time gaps at the grid cells, especially in the 19th century. Instead, there is just one climatology for the entire 20th century (Jan 1900 to Dec 1999). It consists of 12 values, one per month, for all grid cells. These climatologies are subtracted from the absolute values to create the anomalies for the assimilation.
Experimental design
In our DA framework we work with anomalies. As already described above, both the model simulations (xb) and long observations (y) are transformed into anomalies, except for the ship pressure measurements, where anomalies are calculated relative to the 1900–1999 climatology. Since we only assimilate anomalies, the centennial-scale variability in our dataset is influenced by the reaction of the model to prescribed forcings, whereas annual-to-multi-decadal variability is improved based on the assimilated observations. By working with anomalies, we do not correct for existing biases in the model simulations and the reconstructions remain consistent in the model world. This has the advantage of not introducing artificial trends due to temporal changes in the observation data availability.
The implementation of offline DA approaches and the estimation of the background-error covariance matrix vary among the different paleoclimate reconstructions. One offline DA method assumes a time-invariant Pb1,3, while in others Pb is forcing-dependent (e.g., amount of volcanic aerosols, ENSO state, etc.) and is recalculated from the precomputed model ensemble for this specific time step2. In a further study, time-invariant and transient covariance matrices are combined to offset the relatively small sample size of transient simulations and thereby obtain a better estimation of the errors5. Here, we also use a blending technique to calculate the background errors to compensate for the small ensemble size. The blended background errors (Pblend) are calculated by combining a climatological background-error covariance matrix Pclim with the transient one (Pb) as:
where Pclim is derived by randomly selecting 100 climate states from the 590 year-long transient model simulations (20 ensemble members * 590 years) for each assimilation window. Based on previous results51 the weights (β1, β2) of the covariance matrices set to be equal. Both β1 and β2 are 0.5. Hence, in our DA scheme, the Pb is replaced with Pblend in Eqs. 3, 4.
As mentioned above, the impact of an observation on xb is often limited by localizing the background-error covariance matrix. We implement a widely used localization technique, where each element of the error covariance matrices is multiplied with the respective element of a distant-dependent correlation function. We use a Gaussian localization function as:
where the distance between two grid boxes is denoted by z, and L stands for the length-scale parameter. We apply a stricter localization on Pb, which is calculated from 20 ensemble members than on Pclim, which we calculate from 100 members. The applied length-scale parameters are summarized in Table 4. We also set values to 0 where the distant-dependent correlation is <0.001.
In order to be able to track the impacts of observations, we provide an observation feedback archive in which metadata and information about the preparation of the input data are given. The metadata and preparation information are the following: observation and record ID, the name, the coordinates, as well as the altitude. Additionally, we provide the year in which the observation was taken, the year in which it is assimilated, the season, the data type, variable and the unit, the original and transformed value (e.g., interpolation from Julian to Gregorian calendar in early instrumental data or standardization in case of documentary and proxy data,) as well as the 71-year moving climatology and anomaly, the reference period and reference dataset, the inherited quality check flags, the observational error variance, to which cycle the observation belongs and the coefficients of the H forward operator. Regarding the assimilation, for each record we store the flags of the background check and whether the record is assimilated, the length-scale parameters, the background and analysis departure in the same grid cell from each member, the ensemble mean and from the 71-year model climatology, and a bias term of medium and short observations.
In our setup H is always linear and when necessary calibrated in the preprocessing steps and stored in the half-yearly (previous year’s October to March and April to September) observational input files together with the observational error variances. The xb state vector is built from 6 monthly fields of several variables. Therefore, we not only apply a localization in space but also in time; that is, we allow the observations to update the monthly fields based on the H forward operator. In the case of monthly instrumental measurements, these will have an influence only on one month. Seasonal documentary and proxy data, however, can affect several months. After preparing the observations and the monthly anomalies of the ensemble members, in the first step we assimilate all proxy records, documentary data and marine measurements as well as long (>50 years) instrumental measurements. Before a non-marine observation is assimilated we check first whether the observation is within the ±5 range of the square root of the sum of model and observation variances \(\left(\sqrt{{\boldsymbol{H}}{{\bf{P}}}^{{\bf{b}}}{\boldsymbol{H}}+{\bf{R}}}\right)\) from the model ensemble mean. In the case of marine observations a stricter filtering was implemented because we wanted to give less weight to pressure measurements from ships. Therefore a marine sea-level pressure record has to be within a ±2 range of the model ensemble mean. If an observation does not pass the prescribed quality check, it is assimilated passively; that is, the climate state is not updated by the observation but its potential impact is stored in the observation feedback archive file. Furthermore, instrumental measurements of the same type are averaged if more than one observation can be found in the same grid box, and then their average is assimilated.
The assimilation is paused after assimilating all proxies, documentary data, and instrumental records longer than 50 years in the first cycle (cycle1). The 5 to 50 year-long instrumental time series are debiased before they are assimilated in the second cycle (cycle2) (Fig. 1). For debiasing we use a simple approach of fitting the first two harmonics to the annual cycle of the observation and the ensemble mean of cycle1 analysis. The annual cycles of the cycle1 analysis and the medium-length time series (belonging to cycle2) are calculated from the months where measurements are available. In most cases there is a good agreement between the annual cycle of the cycle1 analysis and the medium-length observations in the temperature and pressure time series, and the harmonic fits approximate well the annual cycles. The annual cycles of number of wet days and precipitation calculated from the cycle1 analysis and the observations agree less, and the harmonic fits follow less the month-to-month variations. In cycle2, the time series are transformed to anomalies as:
where xorig is the original time series, bias is the monthly bias between the fitted annual cycles and modelclim is the 71-year running climatology of the model. From this point, the observation goes through the same steps (quality check, averaging) as in cycle1, before it is assimilated. In cycle2 the model simulations are replaced with cycle1 analysis as the background state and the covariance matrices are also recalculated from them. After assimilating all observations in cycle2, we pause again the assimilation and the biases in the short records (<5 years) are calculated similarly as for cycle2 observations, fitting the harmonics to the ensemble mean of cycle2 analysis and the observations. Then using Eq. 8 the short records are transformed to anomalies and assimilated. The cycle3 analysis is the final product, the ModE-RA paleo-reanalysis.
In addition to the ModE-RA paleo-reanalysis, we generated another reanalysis ModE-RAclim to allow the user to disentangle the effect of observations and boundary conditions (forcings) on ModE-RA. For ModE-RAclim the members of the xb state vector are randomly selected from the 589 year-long 20 transient ModE-Sim simulations. We use 100 randomly selected members at each half year in the assimilation from which the Pb background-error covariance matrix is calculated. This approach is similar to other reconstructions1,3 with the exception that they always use the same xb and Pb and work with absolute values. Note that the random sampling used for the ModE-RAclim prior largely averages out the forced signal arising from the forcings and boundary conditions in ModE-Sim and the assimilated observations have been high-pass filtered. Variability at scales longer than 71 years is not present in this sensitivity experiment. For the localization of the background-error covariance matrix the larger length-scale parameters were employed (Table 4, right column). Observations are handled as in the ModE-RA assimilation, assimilating them in three cycles.
Data Records
Most of the input datasets are publicly available and can be downloaded from the following data repositories: PAGES2k records can be found on figshare52, Ocean2k53 and Iso2k54 can be downloaded from NOAA/WDS Paleoclimatology; the N-TREND dataset can be found on the project website55; the gridded Briffa/Schweingruber tree-ring dataset can be downloaded from the Climatic Research Unit56; the processed tree-ring chronologies by Breitenmoser 2014 were used in previous climate reconstruction and are available within the EKF400 project57; DOCU-CLIM is accessible on BORIS58; HCLIM can be downloaded from PANGAEA59; ICOADS data provided by the NOAA PSL, Boulder, Colorado, USA, from their website (https://psl.noaa.gov). The record used from the Neukom 2014 dataset is included in the observation feedback archive and uploaded to WDCC (https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501).
Technical Validation
The quality of the ModE-RA paleo-reanalysis is assessed by comparisons with multiple data sources, depending on their availability through time. We start with comparisons to gridded instrumental datasets in the 20th century. Because these are not fully independent and the quality of ModE-RA changes through time, we additionally compare ModE-RA to the 20th Century Reanalysis version 360 (20CRv3) and mostly proxy based reconstruction1,3,4,42. Finally, we compare ModE-RA to completely independent documentary information. As metrics for evaluation, we use the correlation and the root mean square error skill score (RMSESS). The RMSESS is calculated as:
where we replace xu with the ensemble mean of the paleo-reanalysis (ModE-RA) (\({\bar{x}}^{a}\)); xf is the ensemble mean of the model simulations (ModE-Sim), i.e., this skill score measures the improvement in comparison to ModE-Sim, which already has some reconstruction skill because it follows the external forcings and boundary conditions. (\({\overline{x}}^{b}\)). The time step is denoted with i; we calculate the RMSESS over the 1901–2000 period. For the 20th century validation, we compare the ensemble mean of ModE-RA against three reference datasets for temperature, precipitation, and sea level pressure (xref): HadCRUT539, GPCC version 202214,61, and HadSLP262. These datasets are not entirely independent from the paleo-reanalysis, but only time series starting before 1890 (before the breakpoint detection) were assimilated in ModE-RA. The skill metrics are calculated using anomalies relative to the 1961–1990 period. All three validation datasets were remapped to the spatial resolution of the model simulations. Although ModE-RA has monthly resolution, we present half-year averages in the evaluation because of similar absolute values and very small differences between single months (see Figs. S1–S3). Differences between the two half-years are more relevant because many proxies, which represent a growing season, have only been assimilated in one of the half-year assimilation windows.
Temperature correlation between ModE-RA and gridded instrumental dataset are globally high (Fig. 4a,d). Note that the HadCRUT5 dataset is a blended product and uses the HadSST4 sea-surface temperature dataset63 which is also one of the boundary conditions of ModE-Sim; therefore the high correlation values over oceans are expected (Fig. 4c,f). Correlations are also mainly positive over land between ModE-Sim and HadCRUT5 due the 20th-century warming trend, but after assimilating the observations an almost perfect correlation is achieved especially in the Northern Hemisphere and Australia (Fig. 4a,d). For pressure, ModE-Sim has the strongest correlation with HadSLP2 over the tropical ocean (Fig. 5c,f). The assimilation improves the sea-level pressure fields the most over the Northern Hemisphere, in southern South America and a larger region around Australia (Fig. 5a,d). In contrast to temperature and sea level pressure, where the forcings and boundary conditions introduced positive correlations in ModE-Sim, correlation coefficient of ModE-Sim for precipitation are close to zero (Fig. 6c,f). Hence, most information is gained by the assimilation (Fig. 6a,d). Correlation coefficients of precipitation improved in the regions, where most of the precipitation data were assimilated because of the narrower localisation length scale (Table 4).
In addition to analyzing the performance of ModE-RA, the skill of ModE-RAclim is also shown on the figures (Figs. 4b,e, 5b,e, 6b,e). The reconstructed temperature fields of ModE-RAclim have similar skill over land where the skill comes from data assimilation. ModE-RA outperforms ModE-RAclim over marine regions where the skill comes from the SST boundary conditions. The results of the sea-level pressure reconstruction of ModE-RAclim yields again similar skills to ModE-RA, but ModE-RA has higher skill over the tropical Pacific where the information comes again from the SST boundary conditions. The skill of the ModE-RAclim precipitation field is comparable to ModE-RA because ModE-Sim has correlations close to zero and most information is gained by the assimilation.
To evaluate the skill of ModE-RA compared to the model simulation, the RMSESS is calculated by using the ensemble mean of ModE-Sim in the denominator (xf in Eq. 9). With the help of the RMSESS metric we can gain further insights into how well the amplitudes of climate variability are reconstructed. The temperature fields show a notable improvement over ModE-Sim in the extra-tropical land areas of the Northern Hemisphere, the southern part of South America, and Australia (Fig. 7a,b). Sea-level pressure is better reconstructed in the October to March half year than in April to September. The largest increases in the RMSESS can be found over the North Atlantic, Europe, and the Indian Ocean (Fig. 7c). The reconstructed precipitation fields show improved skill mainly in the regions where precipitation measurements were assimilated in both half years (Fig. 7e,f).
The 20th century validation highlights the quality of ModE-RA at the end of the 19th century because no stations that started measuring after 1890 were included anymore. As seen in Fig. 2, the number of assimilated observations decreases back in time and simultaneously the spatial coverage and density shrinks. The ensemble standard deviation ratio (standard deviation of the ModE-Sim ensemble divided by the standard deviation of the ModE-RA ensemble) highlights where the assimilation added information. This remaining ensemble standard deviation after data assimilation in comparison to the prior standard deviation for six time slices between 1450 and 1950 and for two months is shown in Fig. 8 (January: a,c,e,g,i,k and July: b,d,f,h,j,l). There are few observations assimilated in the October to March season before the year 1500. First observations for the boreal winter season become available in the 16th century. In the 18th century, the spatial distribution extends into eastern North America. In contrast, there is already a dense proxy network in the northern hemisphere extra-tropical land region during the April to September season in the year 1450. This is also further increasing through time. Assimilated information in the Southern hemisphere before the year 1800 is sparse. Nevertheless, ModE-RA includes more observations from the 19th and early 20th century than any other currently existing gridded global dataset for data sparse regions such as South America or Africa.
To further evaluate ModE-RA beyond the 20th century, we compare regional-scale variations of the ModE-RA ensemble mean with other reconstructions. Several annual global temperature field reconstructions based on natural climate proxies exist over the Common Era. In a recent study, annual global temperature was generated with six different climate field reconstruction techniques (Analogue method (AM), Canonical correlation analysis (CCA), Composite plus scaling (CPS), Data assimilation (DA), GraphEM (GEM), and Principal-Component Regression (PCR)), all using the same multiproxy input data6 over the Common Era4. Additionally, we compare ModE-RA to the global temperature reconstructions with annual resolution from the Last Millennium Reanalysis (LMR), produced with a similar offline DA method and assimilating multiproxy records1. A reconstruction of paleo-hydroclimate (PHYDA) - reconstructed with an offline DA method using proxy records - provides both annual and seasonal climate fields of several variables3. Furthermore, a monthly temperature reconstruction for European land areas is also available starting in 1659, which is based on multiproxy records, documentary evidence, and instrumental data (hereafter L200442).
For the pre-20th century, we focus on the European land area (25°W–40°E and 35°N–70°N, as defined in L2004). The correlation was calculated at each grid cell between the reconstructions and the ensemble mean of ModE-RA in the overlapping period after the reconstructions were remapped to the resolution of ModE-RA. In the six reconstructions4 a year is defined as April to March, similar to the annually resolved PHYDA reconstruction3. Therefore, annual values from ModE-RA and L200442 were calculated correspondingly from the monthly April to March data. Correlation between LMR and ModE-RA in the annual comparison is calculated between January and December. All annual reconstructions based on multiproxy records mainly use information from tree rings. These primarily provide information for the summer season. Hence, we calculated the correlation between the annual reconstructions and seasonal temperature field from ModE-RA using the months June, July and August (JJA). When the temperature reconstructions are available with higher temporal resolution such as in the case of PHYDA and L2004, we use them to calculate the seasonal correlation. Annual mean correlations between ModE-RA and the other reconstructions are rather low with the exception of L2004 (Fig. 9). When the correlations are calculated between the annual reconstructions (AM, CCA, CPS, DA, GEM, PCR, LMR) and the seasonal summer mean of ModE-RA, the correlation coefficients are higher (Fig. 9), suggesting that previous annual temperature reconstructions are biased towards the summer seasons. The correlation is higher between the boreal summer PHYDA reconstruction and ModE-RA than the annual reconstruction. In contrast, L2004 shows weaker correlations in the boreal summer than in the annual mean comparison (Fig. 9). The reason behind a weaker correlation in boreal summer may be that the annual mean is dominated by winter variability.
Only few datasets have monthly or higher resolved information in the period before the year 1900. Besides L2004, 20CRv3 spans over the 1836–2012 time period and was generated with a DA technique using pressure observations and a weather forecast model60. A more experimental phase of 20CRv3 between 1806 and 1835 is also available. Here, we use L2004 and 20CRv3 to compare the area-weighted monthly temperature anomalies, i.e., after removal of the annual cycle and relative to 1821–1850 over the European land areas in the preindustrial period. The correlation coefficient between ModE-RA and L2004 European average temperature over the 1659–1850 period is 0.27, and over the 1806–1850 period is 0.28. The correlation between ModE-RA and 20CRv3 over the 1806–1850 period is 0.67. We do not expect high correlations because L2004 is a reconstruction based on a smaller dataset with a purely statistical principal component regression method and in 20CR only surface pressure is assimilated and no temperature information.
Additionally, we calculated the correlation between the 500hPa geopotential height in the ModE-RA ensemble mean and the 550hPa geopotential height in 20CRv3 from 1841 to 2000 for each decade. These fields are only updated through the data assimilation framework because no upper air observations were assimilated. The correlations are calculated based on monthly values relative to the 1961–1990 climatologies, separately for the Northern and the Southern Hemisphere using the ensemble means. In general, there is better agreement between the ModE-RA and 20CRv3 over the Northern Hemisphere with the median of correlations strongly increasing until the 1911–1921 decade then the changes become relatively small from decade to decade (Fig. 10). The agreement between ModE-RA and 20CRv3 in the Southern Hemisphere increases throughout the examined period with a slowdown from 1901 to 1950.
Furthermore, we used 71 independent documentary records (Table S1) from the documentary data compilation7 to evaluate ModE-RA in the earlier times. The independent documentary data are first calibrated with observation-based datasets as described in documentary data compilation paper7, and then these models are applied to ModE-RA for estimating the documentary observations. We found strong correlations between the independent data and the ensemble mean of ModE-RA, except for a few series, especially in North America (Fig. 11a). Comparing the correlations of ModE-RA with the calibration statistics, they are almost on a 1:1 line (Fig. 11b).
The observation feedback archive can also be used to analyze individual cases. Here, we present three examples of extremely positive and negative Palmer drought severity indices (PDSI)64 calculated using the temperature and precipitation fields from ModE-RA. The PDSI calculation requires three inputs: potential evapotranspiration (PET), precipitation, and latitude. We estimate PET with the Thornthwaite equation65, which is solely based on monthly mean temperature, in our case from ModE-RA. This PET estimate is then used together with monthly mean precipitation sums to calculate PDSI. The original version of PDSI was calibrated to conditions in central North America. Here, we use the self-calibrating version of PDSI, which adjusts parameters in the PDSI calculations to local conditions around the world66. We compared the PDSI of ModE-RA with two mainly tree-ring based reconstructions, with PHYDA3 and the Old World Drought Atlas (OWDA)67. All three examples represent the boreal summer season.
-
1.
In the summer 1789 in Central Europe, the PDSI values are negative in PHYDA and OWDA, while they are positive in ModE-RA. In the observation feedback archive we found that assimilated early instrumental precipitation measurements and information about the number of wet days per month for the region around 9°E and 49°N point to a strong wet anomaly. This is confirmed by a negative pressure anomaly in early instrumental data. In this case, there is strong support that ModE-RA can be trusted.
-
2.
In the year 1780, ModE-RA shows a strong wet anomaly in the region of Inner Mongolia, around 101° E, 40° N. In the observation feedback archive we found only one precipitation-sensitive tree-ring record in the area at that time. It turns out that the strong positive precipitation anomaly, as well as a strong cold anomaly, are already present in ModE-Sim, which was not significantly modified by the assimilation of this single tree-ring record. We could not identify an external forcing which could be responsible for this anomaly. One possible explanation could be that all members of our relatively small ensemble are coincidentally too moist in this case. Such insights can be gained from the comparison of the model simulations with the paleo-reanalysis.
-
3.
Finally, a drought in summer 1540 in Central Europe is reconstructed by both the OWDA and PHYDA, although with different intensities. PHYDA suggests a moderate drought whereas OWDA points to more extreme conditions. In ModE-RA, we assimilated documentary data in addition to the tree-ring proxies. These documentary indices show strong negative precipitation anomalies and positive temperature anomalies68. Due to the assimilation of this additional documentary evidence, ModE-RA PDSI supports the stronger drought conditions reconstructed by the OWDA.
Usage Notes
ModE-RA (ensemble members and statistics) and ModE-RAclim (ensemble statistics) are uploaded to NOAA and to the World Data Center for Climate (WDCC) at Deutsches Klimarechenzentrum in Hamburg, Germany (https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501 and https://doi.org/10.26050/WDCC/ModE-RAc_s14203-18501)69. The two climate reconstructions are in NetCDF4 format; the NetCDF4 files cover the whole period per variable. Ensemble statistics include the mean (monthly anomalies with respect to the period 1901 to 2000), maximum, minimum and spread in terms of one standard deviation from the ensemble mean. The observation feedback archive files are available in tsv format (one file per 6 months), which contain all relevant information of the input data, how the input data were processed, and useful feedback information from the DA system. A detailed list of all information stored in the feedback archive has been published with the dataset: https://www.wdc-climate.de/ui/entry?acronym=ModE-RA_info. We provide an online visualisation tool to plot maps, timeseries and the locations of assimilated data: http://mode-ra.giub.unibe.ch/climeapp.
The ModE-RA paleo-reanalysis is identical to the ModE-Sim simulations70 in areas far away from any assimilated observations, especially at the beginning of the reconstruction period. With time, more and more observations are available, suggesting that the reconstruction becomes more skillful. Therefore, the users first should ensure how reliable the paleo-reanalysis is for a given region and time period. This can be achieved by looking at the ensemble spread and the differences between ModE-Sim and ModE-RA. Among the reconstructed variables, the ones with observational input data are the most realistically estimated. We encourage the users to make use of the ensemble members and not only the ensemble mean.
ModE-Sim was generated in two phases (1420–1850 and 1850–2009) with different boundary conditions10. In the earlier period, ModE-RA is based on ModE-Sim Set 1420-3, and in the later period on ModE-Sim Set 1850-1. ModE-RA is not split into the two periods of the ModE-Sim prior because the assimilated observational time series lead to a smooth transition between the two periods of the ModE-Sim sets.
ModE-RA was generated by transforming both model simulations and observations to 71-year running anomalies. Hence, users should be aware that the centennial-scale variability is the model response to forcings. Therefore, we see great potential for future research, particularly in terms of intra-annual to multi-decadal variability. We provide monthly anomalies with respect to the 1901 to 2000 climatology and the model climatology for the 1901 to 2000 period. Be aware that the model climatology includes model biases. Therefore, we recommend using anomalies instead of absolute values.
Furthermore, because of the employed setup, unrealistic values (such as negative precipitation) can occur if absolute values are generated by adding back a climatology. This is especially an issue in arid regions where monthly precipitation is not normally distributed. Precipitation is consistent in the periods of 1421–1800 and 1900–2009 when the observational network is quite stable, but in the 19th century, when many of the observation time series start, a trend is introduced in some arid land regions and tropical oceans (Fig. S4). Hence, in the case of the reconstructed precipitation fields, the early and late period should be looked at separately.
ModE-RAclim should be seen as a sensitivity study and is only a side product of the project. ModE-RAclim does not contain centennial scale climate variability. For most users, the main product ModE-RA therefore should be used for regular studies on past climate. The main differences between ModE-RAclim and ModE-RA are on the model side: ModE-RAclim uses 100 randomly picked years from ModE-Sim as a priori state. Thereby, stationarity in the covariance structure is assumed, and the externally-forced signal in the model simulations is eliminated. In combination with ModE-Sim and ModE-RA it can be used to distinguish the forced and unforced parts of climate variability seen in ModE-RA.
ModE-RA makes use of several data compilations and assimilates various direct and indirect sources of past climate compared to 20CRv3. Hence, if monthly resolution is sufficient for the planned study, ModE-RA may have higher quality already from 1850 backwards to analyze past climate changes and can be viewed as the backward extension of 20CRv3.
Code availability
The R code for the quality control of the observational data and their assimilation ModE-RA ensemble can be found together with the entire ModE-RA dataset and observation feedback archive at the World Data Center for Climate (WDCC) at Deutsches Klimarechenzentrum in Hamburg, Germany (https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501).
References
Hakim, G. J. et al. The last millennium climate reanalysis project: Framework and first results. Journal of Geophysical Research: Atmospheres 121, 6745–6764, https://doi.org/10.1002/2016JD024751 (2016).
Franke, J., Brönnimann, S., Bhend, J. & Brugnara, Y. A monthly global paleo-reanalysis of the atmosphere from 1600 to 2005 for studying past climatic variations. Scientific Data 4, 170076, https://doi.org/10.1038/sdata.2017.76 (2017).
Steiger, N. J., Smerdon, J. E., Cook, E. R. & Cook, B. I. A reconstruction of global hydroclimate and dynamical variables over the Common Era. Scientific data 5, 1–15, https://doi.org/10.1038/sdata.2018.86 (2018).
Neukom, R., Steiger, N., Gómez-Navarro, J. J., Wang, J. & Werner, J. P. No evidence for globally coherent warm and cold periods over the preindustrial Common Era. Nature 571, 550–554, https://doi.org/10.1038/s41586-019-1401-2 (2019).
Valler, V., Franke, J., Brugnara, Y. & Brönnimann, S. An updated global atmospheric paleo-reanalysis covering the last 400 years. Geosc. Data J. 1–19, https://doi.org/10.1002/gdj3.121 (2021).
Emile-Geay, J. et al. A global multiproxy database for temperature reconstructions of the Common Era. Scientific data 4, 170088, https://doi.org/10.1038/sdata.2017.88 (2017).
Burgdorf, A.-M. et al. DOCU-CLIM: A global documentary climate dataset for climate reconstructions. Scientific data 10, 402, https://doi.org/10.1038/s41597-023-02303-y (2023).
Tardif, R. et al. Last Millennium Reanalysis with an expanded proxy database and seasonal proxy modeling. Climate of the Past 15, 1251–1273, https://doi.org/10.5194/cp-15-1251-2019 (2019).
Whitaker, J. S. & Hamill, T. M. Ensemble data assimilation without perturbed observations. Monthly Weather Review 130, 1913–1924, 10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2 (2002).
Hand, R., Samakinwa, E., Lipfert, L. & Brönnimann, S. Mode-sim–a medium-sized atmospheric general circulation model (agcm) ensemble to study climate variability during the modern era (1420 to 2009). Geoscientific Model Development 16, 4853–4866, https://doi.org/10.5194/gmd-16-4853-2023 (2023).
Esper, J., Cook, E. R. & Schweingruber, F. H. Low-frequency signals in long tree-ring chronologies for reconstructing past temperature variability. Science 295, 2250–2253, https://doi.org/10.1126/science.1066208 (2002).
Goosse, H. et al. Reconstructing surface temperature changes over the past 600 years using climate model simulations with data assimilation. Journal of Geophysical Research: Atmospheres 115, https://doi.org/10.1029/2009JD012737 (2010).
Bhend, J., Franke, J., Folini, D., Wild, M. & Brönnimann, S. An ensemble-based approach to climate reconstructions. Climate of the Past 8, 963–976, https://doi.org/10.5194/cp-8-963-2012 (2012).
Schneider, U., Becker, A., Finger, P., Rustemeier, E. & Ziese, M. GPCC Full Data Monthly Product Version 2020 at 0.25°: Monthly Land-Surface Precipitation from Rain-Gauges Built on GTS-Based and Historical Data. Global Precipitation Climatology Centre (GPCC) at Deutscher Wetterdienst https://doi.org/10.5676/DWD_GPCC/FD_M_V2022_025 (2020).
Matsikaris, A., Widmann, M. & Jungclaus, J. On-line and off-line data assimilation in palaeoclimatology: a case study. Climate of the Past 11, 81–93, https://doi.org/10.5194/cp-11-81-2015 (2015).
Okazaki, A., Miyoshi, T., Yoshimura, K., Greybush, S. J. & Zhang, F. Revisiting online and offline data assimilation comparison for paleoclimate reconstruction: an idealized OSSE study. Journal of Geophysical Research: Atmospheres 126, e2020JD034214, https://doi.org/10.1029/2020JD034214 (2021).
Evensen, G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans 99, 10143–10162, https://doi.org/10.1029/94JC00572 (1994).
Houtekamer, P. L. & Mitchell, H. L. Data assimilation using an ensemble Kalman filter technique. Monthly Weather Review 126, 796–811, 10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2 (1998).
Anderson, J. L. An ensemble adjustment Kalman filter for data assimilation. Monthly weather review 129, 2884–2903, 10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2 (2001).
Bishop, C. H., Etherton, B. J. & Majumdar, S. J. Adaptive sampling with the ensemble transform Kalman filter. part I: Theoretical aspects. Monthly weather review 129, 420–436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2 (2001).
Brugnara, Y. et al. A collection of sub-daily pressure and temperature observations for the early instrumental period with a focus on the” year without a summer” 1816. Climate of the Past 11, 1027–1047, https://doi.org/10.5194/cp-11-1027-2015 (2015).
Kent, E. C. & Kennedy, J. J. Historical estimates of surface marine temperatures. Annual Review of Marine Science 13, 283–311, https://doi.org/10.1146/annurev-marine-042120-111807 (2021).
Brugnara, Y., Hari, C., Pfister, L., Valler, V. & Brönnimann, S. Pre-industrial temperature variability on the Swiss Plateau derived from the instrumental daily series of Bern and Zurich. Climate of the Past 18, 2357–2379, https://doi.org/10.5194/cp-18-2357-2022 (2022).
Samakinwa, E. et al. An ensemble reconstruction of global monthly sea surface temperature and sea ice concentration 1000–1849. Scientific data 8, 1–16, https://doi.org/10.1038/s41597-021-01043-1 (2021).
Titchner, H. A. & Rayner, N. A. The Met Office Hadley centre sea ice and sea surface temperature data set, version 2: 1. Sea ice concentrations. Journal of Geophysical Research: Atmospheres 119, 2864–2889, https://doi.org/10.1002/2013JD020316 (2014).
Jungclaus, J. H. et al. The PMIP4 contribution to CMIP6–Part 3: The last millennium, scientific objective, and experimental design for the PMIP4 past1000 simulations. Geoscientific Model Development 10, 4005–4033, https://doi.org/10.5194/gmd-10-4005-2017 (2017).
Giorgetta, M. A. et al. Climate and carbon cycle changes from 1850 to 2100 in MPI-ESM simulations for the Coupled Model Intercomparison Project phase 5. Journal of Advances in Modeling Earth Systems 5, 572–597, https://doi.org/10.1002/jame.20038 (2013).
McGregor, H. V. et al. Robust global ocean cooling trend for the pre-industrial Common Era. Nature Geoscience 8, 671–677, https://doi.org/10.1038/ngeo2510 (2015).
Tierney, J. E. et al. Tropical sea surface temperatures for the past four centuries reconstructed from coral archives. paleoceanography 30, 226–252, https://doi.org/10.1002/2014PA002717 (2015).
Konecky, B. L. et al. The Iso2k database: a global compilation of paleo-δ18O and δ2H records to aid understanding of Common Era climate. Earth System Science Data 12, 2261–2288, https://doi.org/10.5194/essd-12-2261-2020 (2020).
Wilson, R. et al. Last millennium northern hemisphere summer temperatures from tree rings: Part I: The long term context. Quaternary Science Reviews 134, 1–18, https://doi.org/10.1016/j.quascirev.2015.12.005 (2016).
Briffa, K. R. et al. Low-frequency temperature variations from a northern tree ring density network. Journal of Geophysical Research: Atmospheres 106, 2929–2941, https://doi.org/10.1029/2000JD900617 (2001).
Breitenmoser, P. D., Brönnimann, S. & Frank, D. Forward modelling of tree-ring width and comparison with a global network of tree-ring chronologies. Climate of the Past 10, 437–449, https://doi.org/10.5194/cp-10-437-2014 (2014).
Neukom, R. et al. Inter-hemispheric temperature variability over the past millennium. Nature Climate Change 4, 362, https://doi.org/10.1038/nclimate2174 (2014).
Rutherford, S. et al. Proxy-based Northern Hemisphere surface temperature reconstructions: Sensitivity to method, predictor network, target season, and target domain. Journal of Climate 18, 2308–2329, https://doi.org/10.1175/JCLI3351.1 (2005).
Briffa, K. R., Jones, P. D., Schweingruber, F. H., Karlén, W. & Shiyatov, S. G. Tree-ring variables as proxy-climate indicators: problems with low-frequency signals. In Climatic variations and forcing mechanisms of the last 2000 years, 9–41, https://doi.org/10.1007/978-3-642-61113-1_2 (Springer-Verlag Berlin Heidelberg, 1996).
Esper, J., Cook, E. R., Krusic, P. J., Peters, K. & Schweingruber, F. H. Tests of the RCS method for preserving low-frequency variability in long tree-ring chronologies. Tree-Ring Research 59, 81–98 (2003).
Franke, J., Frank, D., Raible, C. C., Esper, J. & Brönnimann, S. Spectral biases in tree-ring climate proxies. Nature Climate Change 3, 360–364, https://doi.org/10.1038/NCLIMATE1816 (2013).
Morice, C. P. et al. An updated assessment of near-surface temperature change from 1850: the HadCRUT5 data set. Journal of Geophysical Research: Atmospheres 126, e2019JD032361, https://doi.org/10.1029/2019JD032361 (2021).
Rustemeier, E., Becker, A., Finger, P., Schneider, U. & Ziese, M. GPCC Climatology Version 2020 at 2.5°: Monthly Land-Surface Precipitation Climatology for Every Month and the Total Year from Rain-Gauges built on GTS-based and Historical Data, https://doi.org/10.5676/DWD_GPCC/CLIM_M_V2020_250 (2020).
Luterbacher, J. et al. Reconstruction of sea level pressure fields over the Eastern North Atlantic and Europe back to 1500. Climate Dynamics 18, 545–561, https://doi.org/10.1007/s00382-001-0196-6 (2002).
Luterbacher, J., Dietrich, D., Xoplaki, E., Grosjean, M. & Wanner, H. European seasonal and annual temperature variability, trends, and extremes since 1500. Science 303, 1499–1503, https://doi.org/10.1126/science.1093877 (2004).
Pauling, A., Luterbacher, J., Casty, C. & Wanner, H. Five hundred years of gridded high-resolution precipitation reconstructions over Europe and the connection to large-scale circulation. Climate dynamics 26, 387–405, https://doi.org/10.1007/s00382-005-0090-8 (2006).
Burgdorf, A.-M. A global inventory of quantitative documentary evidence related to climate since the 15th century. Climate of the Past 18, 1407–1428, https://doi.org/10.5194/cp-18-1407-2022 (2022).
Labbé, T. et al. The longest homogeneous series of grape harvest dates, Beaune 1354–2018, and its significance for the understanding of past and present climate. Climate of the Past 15, 1485–1501, https://doi.org/10.5194/cp-15-1485-2019 (2019).
Reichen, L. et al. A decade of cold Eurasian winters reconstructed for the early 19th century. Nature Communications 13, 2116, https://doi.org/10.1038/s41467-022-29677-8 (2022).
Lundstad, E. et al. The global historical climate database HCLIM. Scientific Data 10, 44, https://doi.org/10.1038/s41597-022-01919-w (2023).
Wartenburger, R., Brönnimann, S. & Stickler, A. Observation errors in early historical upper-air observations. Journal of Geophysical Research: Atmospheres 118, 12–012, https://doi.org/10.1002/2013JD020156 (2013).
Pappert, D. et al. Unlocking weather observations from the Societas Meteorologica Palatina (1781–1792. Climate of the Past 17, 2361–2379, https://doi.org/10.5194/cp-17-2361-2021 (2021).
Freeman, E. et al. ICOADS Release 3.0: a major update to the historical marine climate record. International Journal of Climatology 37, 2211–2232, https://doi.org/10.1002/joc.4775 (2017).
Valler, V., Franke, J. & Brönnimann, S. Impact of different estimations of the background-error covariance matrix on climate reconstructions based on data assimilation. Climate of the Past 15, 1427–1441, https://doi.org/10.5194/cp-15-1427-2019 (2019).
Kilbourne, H. et al. A global multiproxy database for temperature reconstructions of the Common Era. figshare. Collection https://doi.org/10.6084/m9.figshare.c.3285353.v2 (2017).
McGregor, H. et al. NOAA/WDS Paleoclimatology - PAGES Ocean2k Synthesis Data Set, National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI), https://doi.org/10.25921/bba3-4t43 (2015).
Konecky, B. & McKay, N. NOAA/WDS Paleoclimatology - Iso2k database global Common Era paleo-d18O and d2H records (iso2k1_0_0,2020) https://doi.org/10.25921/57j8-vs18 (2020).
Wilson, R. et al. N-TREND: Northern Hemisphere tree-ring network development https://ntrenddendro.wordpress.com/ (2015).
Rutherford, S. et al. Proxy-based Northern Hemisphere surface temperature reconstructions: Sensitivity to method, predictor network, target season, and target domain, https://crudata.uea.ac.uk/~timo/datapages/mxdtrw.htm#refs (2005).
Max Planck Institute for Meteorology. Input for EKF400: Original ECHAM simulations (CCC400) and assimilated observations, http://cera-www.dkrz.de/WDCC/ui/Compact.jsp?acronym=EKF400_Input_Data (2017).
Brönnimann, S. & Burgdorf, A.-M. DOCU-CLIM: Collection of global documentary climate data spanning 1400 to 2000, https://doi.org/10.48620/167 (2023).
Lundstad, E., Brugnara, Y. & Brönnimann, S. Global early instrumental monthly meteorological multivariable database (HCLIM), https://doi.org/10.1594/PANGAEA.940724 (2022).
Slivinski, L. C. et al. Towards a more reliable historical reanalysis: Improvements for version 3 of the Twentieth Century Reanalysis system. Quarterly Journal of the Royal Meteorological Society 145, 2876–2908, https://doi.org/10.1002/qj.3598 (2019).
Rustemeier, E., Hänsel, S., Finger, P., Schneider, U. & Ziese, M. GPCC Climatology Version 2022 at 0.25°: Monthly land-surface precipitation climatology for every month and the total year from rain-gauges built on GTS-based and historical data, https://doi.org/10.5676/DWD_GPCC/CLIM_M_V2022_025 (2022).
Allan, R. & Ansell, T. A new globally complete monthly historical gridded mean sea level pressure dataset (HadSLP2): 1850–2004. Journal of Climate 19, 5816–5842, https://doi.org/10.1175/JCLI3937.1 (2006).
Kennedy, J. J., Rayner, N., Atkinson, C. & Killick, R. An ensemble data set of sea surface temperature change from 1850: The Met Office Hadley Centre HadSST. 4.0. 0.0 data set. Journal of Geophysical Research: Atmospheres 124, 7719–7763, https://doi.org/10.1029/2018JD029867 (2019).
Palmer, W. Meteorological drought. us department of commerce weather bureau research paper 45, 58 pp (1965).
Thornthwaite, C. W. An approach toward a rational classification of climate. Geographical review 38, 55–94, https://doi.org/10.2307/210739 (1948).
Wells, N., Goddard, S. & Hayes, M. J. A self-calibrating Palmer drought severity index. Journal of climate 17, 2335–2351, 10.1175/1520-0442(2004)017<2335:ASPDSI>2.0.CO;2 (2004).
Cook, E. R. et al. Old World megadroughts and pluvials during the Common Era. Science advances 1, 1–9, https://doi.org/10.1126/sciadv.1500561 (2015).
Wetter, O. et al. The year-long unprecedented European heat and drought of 1540–a worst case. Climatic change 125, 349–363, https://doi.org/10.1007/s10584-014-1184-2 (2014).
Valler, V. et al. ModE-RA - a global monthly paleo-reanalysis of the modern era (1421 to 2008): Set 1420-3_1850-1, https://doi.org/10.26050/WDCC/ModE-RA_s14203-18501 (2023).
Hand, R., Brönnimann, S., Samakinwa, E. & Lipfert, L. ModE-Sim - a medium size AGCM ensemble to study climate variability during the modern era (1420 to 2009): Set 1420-3, https://doi.org/10.26050/WDCC/ModE-Sim_s14203 (2023).
Acknowledgements
This study was funded by the European Union H2020/ERC grant number 787574 PALAEO-RA. ARF was funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant No. 894064 (AQUATIC). The ECHAM6 simulations were performed at the Swiss Supercomputer Centre (CSCS).
Author information
Authors and Affiliations
Contributions
S.B. conceived the experiments design, V.V. conducted the experiments, V.V. and J.F. analyzed the results and wrote the major part of the manuscript. S.B., Y.B., J.F., R.H. and V.V. prepared the data for the assimilation. V.V. and J.F. developed the figures and tables. All authors discussed the results, helped with the quality control of the final dataset and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Valler, V., Franke, J., Brugnara, Y. et al. ModE-RA: a global monthly paleo-reanalysis of the modern era 1421 to 2008. Sci Data 11, 36 (2024). https://doi.org/10.1038/s41597-023-02733-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02733-8