Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios

Increases in atmospheric carbon dioxide (CO2) concentrations is the main driver of global warming due to fossil fuel combustion. Satellite observations provide continuous global CO2 retrieval products, that reveal the nonuniform distributions of atmospheric CO2 concentrations. However, climate simulation studies are almost based on a globally uniform mean or latitudinally resolved CO2 concentrations assumption. In this study, we reconstructed the historical global monthly distributions of atmospheric CO2 concentrations with 1° resolution from 1850 to 2013 which are based on the historical monthly and latitudinally resolved CO2 concentrations accounting longitudinal features retrieved from fossil-fuel CO2 emissions from Carbon Dioxide Information Analysis Center. And the spatial distributions of nonuniform CO2 under Shared Socio-economic Pathways and Representative Concentration Pathways scenarios were generated based on the spatial, seasonal and interannual scales of the current CO2 concentrations from 2015 to 2150. Including the heterogenous CO2 distributions could enhance the realism of global climate modeling, to better anticipate the potential socio-economic implications, adaptation practices, and mitigation of climate change.

Few climate simulation studies have been based on a globally non-uniform mean CO 2 distribution patterns [9][10][11] . Those produce a bias reduction in estimated mean temperatures, and consequently some understanding of the response of Earth's system to the actual nonuniform CO 2 concentrations. In the Beijing Normal University Earth System Model (BNU-ESM), the inhomogeneous CO 2 simulations are driven by annual CO 2 concentrations with spatial and seasonal changes derived from satellite observation 10 . While in the Community Earth System Model (CESM), spatially inhomogeneous CO 2 runs use prescribed gridded national-level monthly or annual CO 2 emissions weighted by the grid's population density 9,11 . Both BNU-ESM and CESM simulations with spatially inhomogeneous CO 2 reproduce the progressive increases in temperature with better agreement with spatially distributed global surface air temperature observations than using spatially homogeneous simulations 10,11 . The heterogenous CO 2 distributions could enhance the realism of global climate modeling.
Climate modeling taking into account the CO 2 distribution could address some of the known biases in temperature in the control simulations 11 . Including the heterogenous CO 2 distribution could enhance the realism of global climate modeling. Using BNU-ESM, global mean surface air temperature is in the inhomogeneous CO 2 simulations is approximately 0.3 °C lower than that in spatially uniform runs over the period 1986-2005, reducing the warming bias seen in the uniform runs compared with the HadCRUT4 observations 10 . In CESM, spatially homogeneous CO 2 simulations overestimated climate warming over the Arctic, tropical Pacific, while underestimated warming in the mid-latitudes, over most land areas 9 . The inhomogeneous runs simulated by CESM during 1950-2000 produces lower temperatures at both poles than the homogeneous runs, by up to 1.5 °C including statistically significant cooling over the Barents Sea area 11 .
The surface air temperature responses to spatially inhomogeneous atmospheric CO 2 concentrations are mainly controlled by changes in large scale atmospheric circulations, e.g., the Hadley cell, westerly jet, Arctic Oscillation and Rossby waves [8][9][10][11] . Local surface air temperature anomalies under nonuniform CO 2 simulations are affected by the CO 2 physiological response over vegetated areas. The land plants adjust to changes in atmospheric CO 2 by altering their stomatal conductance, which consequently affects the water evapotranspiration from plant leaf to atmosphere 12 . This affects environmental temperature through evaporative cooling, and the evaporated moisture alters the air humidity and influences low cloud amounts by the water vapor diffusion, which is especially obvious in summer when the plants grow vigorously. In the polar areas, the degree of warming amplification depends strongly on the locally distribution of CO 2 radiative forcing, specifically through positive local lapse-rate feedback, with ice-albedo and Planck feedbacks playing subsidiary roles, also suggesting that inhomogeneous spatial distributions of CO 2 concentrations is consistent with significant climatic effects 13 . In marine ecosystems, non-uniform atmospheric CO 2 and temperature biases could affect the uptake and storage of CO 2 in the ocean, which will change regional atmospheric CO 2 concentrations, ocean pH, ocean oxygen concentrations and primary production 14 .
Existing studies with spatially homogeneous atmospheric CO 2 concentrations may have underestimated the temperature gradient from mid-latitudes to high latitudes. Some atmospheric circulation patterns, e.g., the Hadley cell, westerly jet and Arctic Oscillation are theoretically related to the mid-to high-latitude temperature gradients, and are hence potentially incorrectly simulated 9 . Spatially homogeneous atmospheric CO 2 simulations underestimate interannual variability in regional temperature and precipitation relative to the inhomogeneous simulations 9 and so can result in underrating magnitudes and frequencies of extreme event such as droughts, heat waves, floods, and hurricanes 12 . The upper 3 m of Arctic permafrost holding twice as much carbon as the atmosphere is accelerating its thaw due to the intensification of Arctic warming, leading to Greenhouse gases release and accelerating global warming 15 . Biases of temperature from spatially uniform CO 2 responses to ice-albedo-temperature feedbacks would lead to overestimated polar warming relative to inhomogeneously distributed CO 2 in the historical period 13 .
However, climate simulation studies are almost based on a globally uniform mean CO 2 or latitudinally resolved CO 2 datasets for the historical and future scenarios in the Climate Model Intercomparison Project [16][17][18][19] . In the models including representation of the carbon cycle, the CMIP simulations can be driven by prescribed CO 2 emissions accounting explicitly for fossil fuel combustion 19 . Feng et al. 20 provided spatially distributed anthropogenic emissions historical data with annual resolution and future scenario data in 10-year intervals for CMIP6. There is near-real-time daily CO 2 emission dataset monitoring the variations in CO 2 emissions from fossil fuel combustion and cement production since January 1, 2019 at the national level 21 . Shan et al. 22 constructed the time-series of CO 2 emission inventories for China and its 30 provinces following the Intergovernmental Panel on Climate Change (IPCC) emissions accounting method with a territorial administrative scope. The other CMIP simulations can be driven by prescribed CO 2 concentrations, which enables these more complex models to be evaluated fairly against those models without representation of carbon cycle processes 19 . Meinshausen et al. 17 provided a prescribed global-mean greenhouse gases (GHGs) concentrations using atmospheric concentration observations and emissions estimates in the historical period (1750-2005) and using four different Integrated Assessment Models in the future scenario, with some models constraining internally generated fields of GHG concentrations to match those global-mean values. For CMIP6, Meinshausen et al. 18 updated those global-mean and latitudinal monthly-resolved GHG concentration dataset in the historical period. In the future period, there are global annual mean GHG concentration dataset in some alternative scenarios of future emissions and land use changes produced with integrated assessment models 19 .
Here, we provide global monthly distributions of atmospheric CO 2 concentrations with 1° resolution under historical (1850-2013) and future (2015-2150) scenarios in CMIP6, which have equal global annual mean values in the CMIP6 standard CO 2 dataset. The monthly CO 2 distributions dataset can be accessed by the Zenodo data repository 23 (https://doi.org/10.5281/zenodo.5021361). Climate modeling taking into account heterogenous CO 2 distributions could reduce some of the known biases in the control simulations [9][10][11] , to better anticipate the potential socio-economic implications, adaptation practices, and mitigation of climate change.

Methods
The historical CO 2 concentrations follows CMIP6 monthly and latitudinally resolved CO 2 concentrations accounting longitudinal features retrieved from fossil-fuel CO 2 emissions from Carbon Dioxide Information Analysis Center. And the spatial distributions of CO 2 under SSP-RCPs scenarios were generated based on the spatial, seasonal and interannual features of the current CO 2 concentrations distributions.
Historical CO 2 concentrations spatial reconstruction. Since lack of observational evidence of both seasonality and latitudinal gradients of CO 2 concentrations in pre-industrial times, CMIP6 project provides consolidated dataset of historical atmospheric concentrations of CO 2 based on the Advanced Global Atmospheric Gases Experiment (AGAGE) and National Oceanic and Atmospheric Administration (NOAA) networks, firn and ice core data, and archived air data, and a large set of published studies for the earth system modeling experiments 18 . The dataset provides best-guess estimates of historical forcings with latitudinal and seasonal features (available at https://www.climatecollege.unimelb.edu.au/cmip6).
The atmospheric CO 2 concentrations from CMIP6 has only spatial distributions in latitude but not in longitude. We reconstructed the CMIP6 historical CO 2 concentration data with global 1° resolution based on the fossil-fuel CO 2 emissions data from Carbon Dioxide Information Analysis Centre (CDIAC). The CDIAC fossil-fuel CO 2 emissions used here are based on fossil-fuel consumption estimates, which distributes spatially on a 1° latitude by 1° longitude grid from 1751 to 2013 24 . (available at https://cdiac.ess-dive.lbl.gov/trends/emis/ meth_reg.html). However, there is no value of the CDIAC CO 2 emissions over land without human activity and ocean, where CO 2 emissions values are filled with the average values of their latitudes of CO 2 emissions. The processed global carbon emissions data from CDIAC is used as features of CO 2 distributions and seasonal cycle for downscaling historical atmospheric CO 2 concentrations in each month (Fig. 1). The ratio of CDIAC CO 2 emissions in each grid to its latitude averaged is calculated as: where C i represents CO 2 emission in each grid, and C LAT is the corresponding latitude average CO 2 emissions. The ratio RLAT i is normalized as, where RNLAT i represents the normalized ratio RLAT i , RLAT max is the maximum value of RLAT i , and RLAT min is the minimum value of RLAT i . The maximum difference of latitude averaged CO 2 concentrations (PD) for CMIP6 data is calculated as,  www.nature.com/scientificdata www.nature.com/scientificdata/ is the maximum latitude CO 2 concentration, CO min 2, represents the minimum latitude CO 2 concentration.
The difference factor W i in each grid is calculated as, The reconstructed CO 2 concentrations CO i grid 2, equals to original CO 2 concentrations and the difference factor in each grid, as is the CO 2 concentrations in CMIP6.

SSP-RCPs CO 2 concentrations spatial reconstruction.
In the future time period, CO 2 concentration data for CMIP6 from 2015 were derived from the eight shared socioeconomic pathway (SSP) and representative concentration pathways (RCP) scenarios (Table 1) using the reduced-complexity climate-carbon-cycle model MAGICC7.0 25 . The five SSP scenarios SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 that are used as priority scenarios highlighted in ScenarioMIP for the IPCC sixth assessment report 19 . The SSP1-1.9 and SSP1-2.6 are both in the "sustainability" SSP1 socio-economic pathway but with about 1.9 and 2.6 W m −2 radiative forcing level in 2100, reflecting ways for 1.5°C and 2°C targets under the Paris Agreement, respectively. The SSP2-4.5 follows "middle of the road" socio-economic pathway with a nominal 4.5 W m −2 radiative forcing level by 2100. The SSP3-7.0 is in the "regional rivalry" socio-economic pathway and a medium-high radiative forcing scenario. SSP5-8.5 marks the upper edge of the SSP scenario spectrum with a high reference scenario in a high fossil fuel development world throughout the 21st century. SSP5-3.4 follows SSP5-8.5, an unmitigated baseline scenario, through 2040, at which point aggressive mitigation is undertaken to rapidly reduce emissions to zero by about 2070 and to net negative levels thereafter. In addition, the SSP4-6.0 and SSP4-3.4 scenarios update the RCP6.0 pathway and fill a gap at the low end of the range of future forcing pathways, respectively. CMIP6 CO 2 concentration data in each SSP-RCP scenario is available at https://esgf-node.llnl.gov/search/input4mips/. The global annual mean atmospheric CO 2 in the CMIP6 future scenarios are interpolated temporally and spatially based on the features of CO 2 distributions and seasonal cycle of the current monthly atmospheric CO 2 concentrations distributions from 2015 to 2024 (the geotif2nc_2015_2024.nc file is contained within "Code.zip" archive accessed via the Zenodo data repository 23 https://doi.org/10.5281/zenodo.5021361) simulated based on the monthly reconstructed historical CO 2 concentrations using autoregressive integrated moving average (ARIMA) method 26,27 (Fig. 1).
The ratio (S i m ) of the monthly CO 2 concentrations in each grid to the global mean averaged during 2015-2024 is calculated as  is the global annual mean CO 2 concentrations in the CMIP6 future scenarios.

Data Records
All atmospheric CO 2 output grids can be accessed via the Zenodo data repository 23   www.nature.com/scientificdata www.nature.com/scientificdata/ longitude (Degrees East of the Prime Meridian [cell centres], n = 360). Each NetCDF file contains a monthly variable representing mole fraction of carbon dioxide in air (variable name: values in the historical file and in the future scenario files) with the unit ppm and the 1° × 1° resolution. There are 127,526,400 and 105, 753,600 unique data points for the historical file and each future scenario file. All grids are bottom-left arranged with coordinates referenced to the prime meridian and the equator.
The spatial distributions of historical CO 2 concentrations averaged during 1890-1989 shows that the high CO 2 concentrations appears in the developed regions, e.g., Europe and Eastern part of the United States (Fig. 2).  (Table 2), which are associated with regional CO 2 emissions. In addition, the CO 2 concentration 391.16 ppm in China is slightly less than that in the United Kingdom, which is associated with the low CO 2 concentrations in the west of China (Fig. 2). Fig. 3 shows the CO2_SSP{XYY}_2015_2150.nc files are generated based on the eight SSP and RCP scenarios, including SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-3.4 and SSP5-8.5 which provide global distributions of CO 2 concentrations under different socio-economic development pathway associated radiative forcing levels. In these eight scenarios, the average CO 2 concentrations in the Northern Hemisphere (NH) is higher than that in the Southern Hemisphere (SH). High CO 2 concentrations relative to the global average is mainly distributed in Europe, Eastern United States, and East Asia. Under each scenario, global CO 2 concentrations averaged in 2041-2060 ranges 420-590 ppm, and the CO 2 concentrations averaged during 2081-2100 is between 380-1030 ppm (Figs. 4, 5). www.nature.com/scientificdata www.nature.com/scientificdata/ Under SSP5-8.5, the average CO 2 concentrations in China and the United Kingdom are 1020.70 ppm and 1021.51 ppm, respectively, during 2081-2100, while the CO 2 concentration is 998.28 ppm in Australia (Table 3).

technical Validation
In this validation section, GOSAT surface CO 2 concentrations and AIRS mid-tropospheric CO 2 concentrations products were used for comparison with the reconstructed distributions of atmospheric CO 2 concentrations. The GOSAT launched in January 2009 observing infrared light reflected and emitted from the earth's surface and the atmosphere provides three-dimensional distributions of CO 2 products calculated from the Level 4 A data www.nature.com/scientificdata www.nature.com/scientificdata/ product using a global atmospheric transport model from 2009 [28][29][30] . The data product has a horizontal resolution of 2.5° × 2.5° and a time step of six hours. The satellite Aqua was launched in May 2002 and operates in a near polar sun-synchronous orbit, and its mission is to observe the global water and energy cycle, climate change trend, and response of the climate system to the increase in greenhouse gases 1,31 . It retrieves the global daily or monthly CO 2 concentrations over land, ocean and polar regions 6 . AIRS mid-tropospheric CO 2 concentrations product is retrieved using the Vanishing Partial Derivative method 32 , with the 90 km × 90 km spatial resolution covering 90°N-60°S. The AIRS CO 2 retrieval product provides a continuous global nonuniform distributions of mid-tropospheric CO 2 concentrations from 2003 to 2016.
The multi-year mean reconstructed atmospheric CO 2 concentrations are slightly higher than that of the AIRS mid-tropospheric CO 2 concentrations product in the NH high latitudes and mid-latitudes of the SH, but lower in the mid-latitudes of the SH. In the 45°S-60°S latitude band, about 10 ppm (3%) increase in the reconstructed CO 2 concentrations is statistically significant relative to the AIRS averaged during 2003-2016 (Fig. 6a). The reconstructed CO 2 concentrations are about 4 ppm (1%) higher and lower than the GOSAT surface CO 2 concentrations in the 30°S-60°S latitude band and in the East Asia and its adjacent sea areas, respectively, however, the biases are both not statistically significant at the 5% level using the Student's t test (Fig. 6b).
Relative to the AIRS, there are some statistically significant seasonal overestimations of the reconstructed CO 2 concentrations with over 12 ppm averaged during 2003-2016 mainly located in the 45°S-60°S latitude band in DJF (Fig. 7). In MAM, the reconstructed CO 2 concentrations are 2-6 ppm lower in the NH and 2-6 ppm higher in the SH than that in the AIRS. In JJA, the reconstructed CO 2 concentrations are 2-6 ppm lower at the latitude bands of 30°N-60°N, 15°S-30°S, and 45°S-60°S, and 2-6 ppm higher in the 60°N-90°N latitude band than that in the AIRS. In SON, the bias of the reconstructed CO 2 concentrations is from −2 to 2 ppm in most regions of the world, except in 45°S-60°S latitude band relative to the AIRS.
Relative to the GOSAT, there are some statistically significant seasonal overestimations of the reconstructed CO 2 concentrations between 8 to 10 ppm averaged during 2010-2018 mainly at the 45°S-70°S latitude bands in DJF (Fig. 8). In JJA, the reconstructed CO 2 concentrations are over 10 ppm higher than the GOSAT data in the Far eastern and North-western federal districts of Russia, and Eastern Canada. In MAM, the reconstructed CO 2 concentrations are 2-8 ppm lower in the NH and 2-6 ppm higher in the SH than that in the GOSAT. In SON, the overestimations of the reconstructed CO 2 concentrations are from 2 to 6 ppm, and the underestimations of the reconstructed CO 2 concentrations is from −6 to −2 ppm in some areas of South America, South Africa and Eastern China relative to the GOSAT.  Table 3.
Multi-year average atmospheric CO 2 concentrations between 2081-2100 in some countries under future scenarios. www.nature.com/scientificdata www.nature.com/scientificdata/ Compared with the GOSAT surface atmospheric CO 2 concentrations, there is similar trend and seasonal cycles with the monthly global mean reconstructed CO 2 concentrations (Fig. 9a). The seasonal cycle with high CO 2 concentrations in MAM and low CO 2 concentrations in JJA is closely related to the seasonal cycle of plant growth 33 . The monthly global mean AIRS mid-tropospheric CO 2 concentrations have a similar trend and the  www.nature.com/scientificdata www.nature.com/scientificdata/ peak feature of each seasonal cycle with the reconstructed and GOSAT CO 2 concentrations, but the valley feature of seasonal cycles, which is associated with the transport of atmospheric CO 2 and less impacts from plant CO 2 absorption 34,35 . The R-squared correlation (R 2 ) is 0.95 between the monthly global mean reconstructed CO 2 and the AIRS CO 2 product, and the R 2 between the reconstructed and the GOSAT product is 0.99 (Fig. 9b). Figure 10 shows the zonal mean CO 2 concentrations (ppm) for the AIRS, the GOSAT, and the reconstructed data during 2010 to 2013 averaged over land and averaged over ocean, separately. The zonal mean CO 2 concentrations for the reconstructed data averaged over land and over ocean both have a similar distribution pattern with the surface CO 2 concentrations in GOSAT, with higher CO 2 values in the Northern Hemisphere than that in the Southern Hemisphere, though there are some overestimates in the middle latitudes for the reconstructed CO 2 concentrations, which is consistent with the high CO 2 emissions in the middle latitude bands (Fig. 10). In the low and middle latitudes of the Southern Hemisphere, the reconstructed CO 2 concentrations over land and over ocean are both between the AIRS and the GOSAT range of CO 2 concentrations, respectively (Fig. 10). We also note that our historical CO 2 concentrations distributions should be regarded as highly uncertain. However, some plausibility of the CO 2 concentrations distributions is obtained by comparison with satellite observations (e.g., ARIS, GOSAT satellite CO 2 concentrations products) at the zonal mean and grid scales.

Usage Notes
This data is intended for use as a prior in global climate modeling, potential socio-economic implications and mitigation of climate change, and adaptation practices. The historical global monthly distributions of atmospheric CO 2 concentrations with 1° resolution from 1850 to 2013, including 1 file NetCDF format file named CO2_1deg_month_1850-2013.nc And the spatial distributions of nonuniform CO 2 under SSP-RCP scenarios are from 2015 to 2150, including 8 files NetCDF format with the naming convention CO2_SSP{XYY}_2015_2150. nc, where X and YY are the shared socioeconomic pathway and radiative forcing level at 2100, respectively. Each NetCDF file contains a monthly variable representing mole fraction of carbon dioxide in air (ppm). including 3 dimensions: time (month of the year expressed as days since the first day of 1850, n = 1968 and 1632 for the historical and the future, respectively); latitude (Degrees North of the equator [cell centres], n = 180); longitude (Degrees East of the Prime Meridian [cell centres], n = 360). We anticipate that the dataset will be widely used by Earth system modeling, agriculture management, and socio-economic analysis, to assess the climate, environmental and socio-economic implications of considering past and on-going inhomogeneous CO 2 distributions, and for formulating strategies of spatial, as well as global carbon reduction.

Code availability
The code used to perform all steps described here and shown in Fig. 1 is contained within a.zip archive named "Code.zip". The code can be accessed via the Zenodo data repository 23 (https://doi.org/10.5281/zenodo.5021361).