A high-resolution climate simulation dataset for the past 540 million years

The Phanerozoic Eon has witnessed considerable changes in the climate system as well as abundant animals and plant life. Therefore, the evolution of the climate system in this Eon is worthy of extensive research. Only by studying climate changes in the past can we understand the driving mechanisms for climate changes in the future and make reliable climate projections. Apart from observational paleoclimate proxy datasets, climate simulations provide an alternative approach to investigate past climate conditions of the Earth, especially for long time span in the deep past. Here we perform 55 snapshot simulations for the past 540 million years, with a 10-million-year interval, using the Community Earth System Model version 1.2.2 (CESM1.2.2). The climate simulation dataset includes global distributions of monthly surface temperatures and precipitation, with a 1° horizontal resolution of 0.9° × 1.25° in latitude and longitude. This open access climate dataset is useful for multidisciplinary research, such as paleoclimate, geology, geochemistry, and paleontology.

latitude and longitude. It offers elaborate global distributions of monthly surface temperatures and precipitation throughout the Phanerozoic Eon. It can be referenced and cross validated by research fields across geology, paleobiology, geochemistry, etc.
Two versions of the CESM1.2.2 are used in this study. One is a fully-coupled version which uses a T31 spectral dynamical core for the atmospheric (Community Atmosphere Model version 4, CAM4 25 ) and land (Community Land Model version 4, CLM4 26 ) components (horizontal grid of 3.75° × 3.75°) with 26 atmospheric layers in the vertical. The ocean (Parallel Ocean Program version 2, POP2 27 ) and sea-ice (Community Ice CodE version 4, CICE4 28 ) components employ a nominal 3° irregular horizontal grid (referred to as g37) with 60 oceanic layers in the vertical. The River Transport Model (RTM) has a default resolution of 0.5° × 0.5° in latitude and longitude, which directs all runoff to oceans, without interior drainage loops based on computations of surface topography.
The other one is the atmosphere-land-coupled version which applies the finite-volume dynamical core with a 1° atmosphere (f09: 0.9° × 1.25° latitude versus longitude) with the same vertical levels as T31. For this version of simulations, the model is driven by prescribed climatological monthly mean sea surface temperatures (SSTs), sea-ice (SI), and annual mean land vegetation, which are derived from the T31_g37 equilibrium simulations. Model performance of these two versions has been assessed and validated for modern conditions 29,30 .
CLM4 incorporates a carbon-nitrogen (CN) cycle component that is prognostic in carbon, nitrogen and vegetation phenology 31 . Note that here carbon and nitrogen fluxes are purely diagnostic and are not passed to the atmosphere, and thus do not influence atmospheric CO 2 concentrations 26 . Even though the carbon fluxes are only diagnostic, the CN model will have an influence on the climate simulation because seasonal and interannual vegetation phenology, i.e., leaf area index (LAI) and vegetation height, is prognostic 30 . In addition, CLM4 has the option to run the CN model with dynamic vegetation (CNDV) 32,33 . CNDV modifies the CN framework to implement plant biogeography updates, and simulates unmanaged vegetation including tree, grass, and also shrub 34 plant functional types (PFTs). It is worth pointing out that the PFTs are the same for all simulations, and that plant evolution is not considered in this study. Establishment of new PFTs is based on the warmest minimum monthly air temperature and minimum annual growing degree-days above 5 °C, and minimum precipitation of 100 mm yr −1 is required to introduce new PFTs. Survival is based on the coldest minimum monthly air temperature 35 . PFTs must be able to survive in order to establish. CNDV simulates a reasonable present-day distribution of PFTs but underestimates tundra vegetation cover 33 . Here the CNDV is active only in the fully-coupled T31_g37 model to generate PFTs.
Experimental set-up. Boundary conditions. We perform 54 time-slice simulations from 540 Ma to 10 Ma, with a time interval of 10 Myr between each two snapshot simulations. The pre-industrial simulations will be described later. Paleogeographic maps from the paleo-digital elevation model (paleoDEM) 36 are used here as boundary conditions. The paleoDEM elaborates the changing distribution of deep oceans, shallow seas, lowlands, and mountainous regions, which is an estimate of the elevation of the land surface and depth of the ocean basins measured in meters with a resolution of 1° × 1°. The digital paleogeographic maps are interpolated according to model resolutions with minor changes of the land-sea masks for the purpose of model stability. Note that the paleogeographic maps do not include information of ice sheets, and that there are no prescribed ice sheets for simulations from 540 Ma to 10 Ma. The initial land surface is set as warm grassland, and the surface soil is set to a uniform loam. CO 2 concentrations and solar radiation. Different from previous simulation studies, here we use reconstructed global mean surface temperatures (GMSTs; All GMSTs herein are annual means.) 37,38 to constrain our simulations, rather than using reconstructed CO 2 concentrations. This alternative approach of simulations is equivalent to using reconstructed GMSTs to "predict" atmospheric CO 2 concentrations. Thus, it is worthy here to briefly introduce the methodology of the GMST reconstruction 37,38 .
The time series of Phanerozoic GMSTs was reconstructed by combining estimations of pole-to-equator temperature gradients derived from lithologic records and tropical temperatures derived from oxygen isotopes 5 . First, five major Köppen belts are mapped, using lithologic indicators of climate (tillites, evaporites, coals, bauxites, etc.). Based on modern climate conditions, temperatures are assigned to each of the Köppen belts, so that the zonal mean pole-to-equator temperature profiles can be obtained. Second, oxygen isotopic values are converted to estimate tropical temperatures, with modifications based on geological and paleontological considerations. As a result, GMSTs can be calculated using meridional temperature profiles and tropical temperatures. Readers can refer to Scotese et al. 5 for comprehensive description of the methodology in deriving the GMSTs and its uncertainties.
For simulations from 540 Ma to 10 Ma, CO 2 concentrations are tuned until simulated GMSTs are asymptotic to reconstructed GMSTs within ± 0.5 °C. In the process of tuning CO 2 concentrations, we first estimate the required CO 2 concentration according to the climate sensitivity of the T31_g37 version and use it to force the model. After running the model for about 2000 years, we check the simulated GMST and decide to increase or decrease the CO 2 concentration. We need to try a few times until the difference between the simulated GMST and the reconstruction value is within ± 0.5 °C at the equilibrium state in which the net radiation at the top of www.nature.com/scientificdata www.nature.com/scientificdata/ the atmosphere (TOA), averaged over the last 100 model years, is within ± 0.1 W m −2 . Except for the CO 2 concentration, all other atmospheric compositions are set to the pre-industrial (PI) values. Solar radiation is linearly increased from 1302 W m −2 at 540 Ma to 1361 W m −2 at the present, with an increasing rate of about 0.08% per 10 Myr 39 . Orbital parameters are set to the present values. A summary of CO 2 concentrations and solar radiation used in our simulations is given in Table 1.
Two-step simulations. For the first step of simulations, the fully coupled T31_g37 CESM1.2.2 is used. The key in this step of simulations is to tune CO 2 concentrations until the simulated GMSTs are close to reconstructions at equilibrium states, that is, GMST differences between simulations and reconstructions are within ± 0.5 °C. We initialize surface temperatures of the atmospheric component model with zonally uniformly distributed values ranging from 20 °C at the equator to 1 °C at the poles for all simulations. Ocean temperature is initialized with a globally uniform vertical profile. Three types of vertical temperature profiles are chosen. For cold periods, the vertical temperature profile varies from 15 °C at the surface to 2 °C at the bottom. For warm periods, the vertical profile varies from 20 °C to 4 °C. For hot periods, the vertical profile varies from 24 °C to 8 °C. The reason why we choose the three types of vertical temperature profiles is to have simulations reach equilibrium states faster. The initial ocean salinity is globally and vertically uniform, with a value of 35 psu, for all simulations. SI and PFTs are initially set to zero. In all the simulations, there are no prescribed ice sheets, except for the PI simulation.
All simulations are integrated for more than 4000 model years to reach equilibrium states at which the net radiation at the TOA, averaged over the last 100 model years, is less than 0.1 W m −2 . Some of the simulations are even run for more than 6000 model years. The model was run with the CNDV model to generate global vegetation cover.
For the second step of simulations, repeating annual cycles of monthly SSTs and SI, as well as annual mean vegetation cover, averaged over the last 100 model years in the first step of simulations are used to drive the f09 atmosphere-land-coupled model. Paleogeography, CO 2 concentrations, and solar radiation remain the same as those in the first step. All simulations are integrated for 100 model years so that the atmosphere model reaches equilibrium states. The results presented here are the averages over the last 60 model years.
Pre-industrial simulation. For reference, the PI simulations are performed with the modern continental configuration and the model default PI vegetation cover and ice sheets. CO 2 concentration is set to the PI value, i.e., 280 ppmv. The solar constant is set as 1361 W m −2 . All other conditions are set to the PI default values. Note that the PI f09 simulation is not driven by the SSTs, SI, and vegetation derived from the T31_g37 run, but by the model default PI conditions.

Data records
The datasets are constructed in the form of the NetCDF File 'High_Resolution_Climate_Simulation_ Dataset_540_Myr.nc' and can be found in the Figshare repository 40 Figure 1a shows the time series of simulated GMSTs (black line), which range from about 12 °C to about 27 °C over the past 540 Myr. The simulated GMSTs match the reconstructed GMSTs by Scotese 37,38 (red asterisks) very well. A full list of simulated annual mean GMST values is presented in Table 1. Figure 1b shows the evolution of simulated zonal mean surface temperatures. First, zonal mean surface temperatures also demonstrate the "double hump" feature. Second, zonal mean surface temperatures show weaker meridional gradients during warmer periods such as the Early Paleozoic and the Mesozoic, and sharper meridional gradients during cooler periods such as the Late Paleozoic and the Late Cenozoic.
It is notable that the simulated equator-to-pole profiles of zonal mean surface temperatures are different from reconstructions by Scotese 37,38 , although the simulated and reconstructed GMSTs are almost the same. For example, Fig. 1c,d show zonal mean surface temperature profiles of cold climate (310 Ma) and hot climate (240 Ma), respectively. In both plots, the simulated surface temperatures are higher than reconstructions in the tropics and lower at middle latitudes, with sharper meridional gradients in the subtropics. Figure 2a shows the time series of simulated global and annual mean precipitation. It ranges between 950 mm yr −1 and 1400 mm yr −1 over the past 540 Myr. A full list of simulated global and annual mean precipitation is shown in Table 1. Annual and zonal mean precipitation is shown in Fig. 2b. There are two rain bands near the equator, with the maximum precipitation of about 3000 mm yr −1 . Note that the double rain bands could be due to the "double ITCZ" bias, which is a common problem for coupled atmosphere-ocean climate models 41,42 . The secondary rain bands are around 50°N and S, with the largest precipitation of about 1600 mm yr −1 . Two relatively dry bands are around 30 °N and S, which are the subtropical dry zones. Precipitation in both polar regions is the lowest.   www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/

technical Validation
As mentioned in the Methods section, atmospheric CO 2 concentrations are predicted by reconstructed GMSTs in the present study. Figure 5 compares CO 2 concentrations between our simulations (blue line) and reconstructions (orange line and shadings) 45 . Clearly, CO 2 concentrations in our simulations are several times higher than reconstructions. This is due to two major reasons. One is the equilibrium climate sensitivity (ECS) of CESM1.2.2, and the other one is related to the dynamic vegetation model used in our simulations.  According to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6) 47 , the likely range of ECS is between 2.5 °C and 4.0 °C, and the best estimate value is 3.0 °C. Thus, the ECS of T31_g37 CESM1.2.2 is close to the best estimate in IPCC-AR6.   Fig. 1 in Foster et al. 45 ). 68% and 95% confidence intervals of the LOESS fit are shown as dark and light grey shadings. LOESS, locally estimated scatterplot smoothing.