Background & Summary

Reducing human-made emissions of CO2 is at the heart of the climate change mitigation efforts in the Paris Agreement. In support of such efforts, the CO2 Human Emission (CHE) project ( has designed a prototype system to monitor CO2 fossil fuel emissions at the global scale. This challenging task requires the capability to detect and quantify the localised and relatively small signals of fossil fuel emissions in the atmosphere compared to the large variability of background CO2 concentrations not directly affected by local sources, and to distinguish anthropogenic sources from vegetation fluxes1,2,3. Using observations of atmospheric constituents to estimate emissions4,5 relies on a good understanding and accurate modelling of their atmospheric variability, which is largely determined by the weather-driven atmospheric transport together with surface biogenic fluxes and anthropogenic emissions. In the CHE project a library of nature runs of CO2 and species co-emitted with CO2 has been produced at different scales and with varying degrees of complexity6 which complements previous nature runs7.

Nature runs are very high-resolution simulations that mimic nature, in that they provide a realistic representation of processes of interest, in this case those modulating atmospheric CO2 variability. These simulations provide a reference for Observation System Simulation Experiments (OSSEs)8 Quantitative Network Design (QND)9. In OSSEs and QND studies, synthetic observations extracted from nature runs are used to assess the impact of different observing system configurations10. It is envisaged that such a monitoring system will rely on the use of a large variety of measurements including species co-emitted with CO2 that can help to isolate the fossil fuel emissions3,11. The future CO2M (Copernicus CO2 Monitoring) satellite mission is purposely designed to provide a high-resolution imaging capability to detect CO2 emission hotspots with high-precision observations of atmospheric CO2 concentrations2,3,12. CO2M will complement a constellation of satellites4 and a global in situ network5 to quantify the atmospheric CO2 variability from which emissions will be derived with atmospheric inversion systems.

Simulating a realistic distribution of CO2 and co-emitters depends on the representation of the surface fluxes, chemical sources/sinks, and atmospheric transport. Here we use the Copernicus Atmosphere Monitoring Service (CAMS) high-resolution forecast of CO2, CH4 and CO ( which has been demonstrated to produce realistic and accurate variability of carbon weather13,14,15. The configuration of the nature run is shown in Fig. 1. Note that the CHE nature run is a free-running tracer simulation unlike the CAMS high-solution forecast which is initialised daily from an atmospheric composition analysis.

Fig. 1
figure 1

(a) Schematic of production framework for CHE nature run dataset (details of different components of the simulation in the text); (b) Overview of CHE nature run model output and strategy for comparison with different types of observations of carbon tracers and other relevant datasets such as lower resolution simulations. The differences between the CHE nature run and the various observations can be used to estimate and shed light into the different sources of uncertainty (orange boxes).

The CHE nature run aims to support scientific studies that will shed light on the challenges of estimating CO2 emissions with the goal to build a CO2 monitoring and verification support capacity3. These challenges span a wide range of aspects from sparse observing systems, consistency between ocean/land observations from different satellite-view modes16, large variability in the biogenic signal17, large representativity errors in anthropogenic emissions13, transport errors18 and stringent requirements of high accuracy observations to estimate small signal with respect to large background values16,19. This global high-resolution dataset can provide a reference for testing different approaches to address those challenges.


Modeling framework

The CAMS high resolution forecasting system at the European Centre for Medium Range Weather Forecasts (ECMWF)13,14,20 has been used to produce the nature run dataset which includes simulations of CO2, CH4 and CO as illustrated in Fig. 1. It is based on the Integrated Forecasting System (IFS) model cycle 46R1 used to produce the operational weather forecast from June 2019 to June 202021. The model has a reduced octahedral Gaussian grid22 with a resolution of Tco1279 (corresponding to approximately 9 km) and 137 model levels. The simulations have been produced by running a sequence of 1-day IFS forecasts of the carbon tracers and weather. The weather forecasts are initialized with state-of-the-art re-analysis of meteorological fields (ERA5)23. The atmospheric tracers start from the CAMS re-analysis24,25 initial conditions at the initial date of the dataset and from then onwards they are cycled from one forecast to the next in a free-running style. The different model components for the carbon weather forecast in the IFS, including the representation of the emissions for the different tracers, are listed in Table 1. All the emissions at the surface are prescribed except for the CO2 biogenic fluxes which are modelled online26,27, providing consistency between the response of fluxes to atmospheric conditions and tracer transport28. There are various differences with respect to the CAMS operational high-resolution forecast in 2015: improved anthropogenic emissions29,30,31 and natural CO2 ocean fluxes32; as well as an improved IFS model version21 and initial conditions23,24,25. The configuration of the simulations with daily re-initialisation of the weather forecast and free-running tracers ensures consistency of the tracer evolution throughout the simulation by avoiding jumps in their concentrations brought by the assimilation of observations in the analysis, while maintaining a realistic and accurate simulation of their atmospheric transport and variability of the underlying biogenic fluxes from the model26,27.

Table 1 Model components with emission datasets used as boundary conditions in the nature run simulation and prescribed atmospheric chemical sources/sinks.

Model output

The standard parameters available from the CHE nature run dataset are listed in Table 2 and Table S1 in Supplementary Information file 1. Additional experimental tagged tracers are provided to characterize the atmospheric enhancement associated with the natural surface fluxes and anthropogenic emissions (Table 3). The enhancement can be computed by subtracting the concentrations of the background tracer without the specific emission/flux from the tracer concentration with the flux/emission. This assumes that the transport is linear. It is worth noting that artificial negative enhancements can occur in the vicinity of plumes due to numerical oscillations associated with the cubic interpolation of the advection scheme around very steep gradients. This can be considered a numerical error in the simulation. The CO2 tagged tracers are simulated without applying any mass fixer in order to ensure the signal comes only from the flux. The tagged tracers provide the enhancement during each 1-day simulation as they are re-initialised every day at 00UTC in order to avoid growing errors associated with the mass conservation33,34. This means the flux enhancement is reset to zero at 00 UTC. Detailed information on those tracers is provided in Table 4.

Table 2 Content of CHE nature run dataset with different parameter types and their associated data volume for the full year.
Table 3 List of experimental CO2 tagged tracers from the CHE nature run dataset.
Table 4 Distribution of XCO2 anthropogenic enhancement (XCO2_FF) accumulated over a 24-hour period from the CHE global nature run as mean number (and percentage in bold) of model cells with XCO2_FF > 0.25 ppm (left columns) and XCO2_FF > 0.50ppm (right columns).

Figure 1b provides an overview of the different types of model output from the CHE nature run dataset and how these can be compared to other datasets including various types of observations5,35,36 as well as atmospheric inversions/simulations of carbon tracers9. Such a comparison can shed some light on the different components of the uncertainty in the simulations of carbon tracers coming from the surface fluxes, the atmospheric transport and the representativity error associated with the limited model resolution14. A complementary lower resolution ensemble of simulations18 (25 km in the horizontal) has been also produced using the same model setup which provides information on emission uncertainty30, transport uncertainty and impact of meteorological uncertainty on biogenic fluxes. Two other major sources of uncertainty stem from the initial conditions of the carbon tracers at the beginning of the simulation24,25 and the biogenic flux model26,27. An estimation of these uncertainties is provided in the Technical Validation section.

Example: Using tagged tracers to characterise anthropogenic plumes over land and ocean

In order to monitor anthropogenic CO2 emissions, it is crucial to observe the CO2 plumes emanating from the emission sources. These observations need to be based either targeted field campaign observations13 or on high resolution imaging satellites10. As satellites have different viewing geometries over land and ocean16, it is very important to understand how many of these plumes are located over land, ocean and coastal regions. Moreover, satellite observations only provide total column CO2 over cloud-free regions. Table 4 provides an example of statistics on the proportion of anthropogenic plumes accumulated over a 24-hour period over land/ocean and the proportion of plumes under cloudy conditions for January and July 2015. These fossil fuel tagged tracers and other tagged tracers associated with the biogenic fluxes, ocean fluxes and biomass burning emissions are all included in the CHE nature run dataset (see Table 3).

Example: Insights into total column variability

The CO2, CH4 and CO observing system is based on in situ observations, at the surface or from tall towers, and remote sensing observations from ground-based stations or satellites providing partial/total column observations. There are currently very few vertical profile observations from aircrafts37,38 and Aircore measurements36,39 that can be used to link the two observation types. For low-resolution transport models assimilating both surface and total column observations in an atmospheric inversions framework, it can sometimes be challenging to combine the surface and total column variability for various reasons. These include errors in the remote sensing observations16, representation errors near the surface and model transport errors associated with vertical mixing40, atmospheric chemistry41, as well as long-range transport42 and the impact of stratospheric intrusions43. The global nature run can be useful to characterize the column variability of carbon tracers44 associated with transport. Figure 2 illustrates the potential use of the CHE nature run to explain the variability of XCO2, XCH4 and XCO at 24 TCCON sites ( The coefficient of determination shows that the variance of the total column can be explained by the different layers in the column in the nature run. When the column is well mixed, the contribution from the different layers is similar. At the sites where the influence of local emissions or natural fluxes is strong, the layers near the surface dominate the variability. Long-range transport in the free troposphere and upper troposphere/lower stratosphere also plays an important role, as depicted by the green/orange bars with higher r2 values than the near-surface layers in purple/red. The dataset can also be used to assess the important contribution of the stratosphere in the variability of XCH445.

Fig. 2
figure 2

Coefficient of determination (r2) [%] of CO2, CH4 and CO total column with different partial layers in the atmospheric column in January and July 2015 at 24 TCCON sites ( The atmospheric layers are defined as follows: from surface to 400 m (SFC), from 400 to 2 km (BL), from 2 km to 5 km (FT), from 5 km to 10 km (UTLS), from 10 km to the top of atmosphere (STRAT). All the column and partial column data have been detrended before calculating the coefficient of determination. All r2 values shown are statistically significant with p-value < 0.01 except when the r2 < 0.001.

Data Records

The CHE nature run dataset can be accessed through the ECMWF API following the examples provided in46. The data can be extracted on the native octahedral grid with the original resolution (tco1279, corresponding to approximately 9 km) or on a regular latitude/longitude grid at the required resolution of the user. Both grib and NetCDF formats are available. The dataset extends from 26 December 2014 to 31 December 2015. The list of contents is provided in Table 2. All meteorological and tracer fields and surface fluxes have been archived with 3-hourly time steps with respect to the 00 UTC initialization of the weather forecast. Step 0 of all the meteorological parameters represents the initial conditions taken from ERA523. Atmospheric species (CO2, CH4 and CO) at step 0 are equivalent to tracers from the previous day at step 24, because they are free-running from one 1-day forecast to the next as illustrated in Fig. 1. Note that the emissions of CO and the CO2 emissions from aviation are not stored in the CHE nature run dataset, but they can be obtained from the Copernicus Atmosphere Data Store (

Technical Validation

The dataset is based on the state-of-the-art operational NWP and CAMS forecasting system21,47 which has been proven to produce reliable and accurate atmospheric CO2, CH4 and CO variability13,14,15. The CHE nature run focuses on 2015, a year characterised by a pronounced decrease in the terrestrial carbon sink associated with the strong El Niño Southern Oscillation (ENSO) of 2015-201648 with droughts49, as well as fires in several regions, particularly over the tropics50. The larger than normal CO2 atmospheric growth rate in 201548,51 and anomalously high fire emissions are well captured by the CHE nature run with a total global annual flux of 6.60 GtC (equivalent to 3.16 ppm/year from 1 January 2015 to 31 December 2015), which is close to the NOAA estimate of 2.99 + /−0.07 ppm/year ( The CO2 components of the budget include 9.29 GtC of anthropogenic emissions, 2.09 GtC of fire emissions, 2.10 GtC ocean sink and 2.69 GtC sink from land ecosystems. These values are consistent with the global carbon budget estimates52.

Example: Evaluation of CO2 sources/sink by vegetation

Biogenic CO2 fluxes associated with vegetation over land can dominate atmospheric CO2 variability on a wide range of time scales from diurnal, synoptic, seasonal to inter-annual28. They are a crucial component for the estimation of the background CO2 underlying the fossil fuel plumes from emission hotspots. This background CO2 has not been directly influenced by the plumes emanating from local anthropogenic sources, but it results from the larger-scale fluxes associated with biogenic sources and sinks over land. The European Eddy Covariance (EC) ecosystem flux data collected and processed by the Integrated Carbon Observation System (ICOS)53 are used to evaluate the uncertainty of modelled biogenic fluxes in the IFS (Fig. 3) which are bias-corrected27 in the CHE nature run. These modelled fluxes are also compared to other flux products, such as FLUXCOM54,55 (extended to include varying diurnal meteorology from ERA5) and the CAMS CO2 inversion (v18r3) product56,57. The EC data were processed and the Gross Primary Production (GPP) and ecosystem respiration (Reco) estimated using the standard methods applied in FLUXNET58 using the observed Net Ecosystem Exchange (NEE). Fig. 3 shows an overall underestimation of the seasonal cycle of NEE, GPP and Reco at the EC sites with typical errors of around 2 μmol m−2 s−1. Synoptic-scale errors are smaller while the diurnal cycle has larger errors of around 4 μmol m−2 s−1 (not shown in Fig. 3). This underestimation is exacerbated by the anomalously high NEE and Reco observed during the European drought in 2015 (Fig. SB7.349). This type of evaluation can be used to understand the source of biogenic flux errors and improve the underlying biogenic models, as well as to quantify the uncertainty of prior fluxes for atmospheric inversions59.

Fig. 3
figure 3

Mean seasonal cycle of CO2 biogenic fluxes [μmol m−2 s−1] at the 25 Eddy Covariance sites. FLUXNET201558 observations [ICOS 2018 drought dataset53] are shown in black; the IFS modelled fluxes in cyan and the bias corrected fluxes used in the CHE nature run in blue; the CAMS inversion product56,57,65 (total flux –anthropogenic emissions) based on surface observations is shown in orange; and the CHE FLUXCOM product54,55 in green. The shading depicts the standard deviation across the 25 sites.

Example: Simulation and observation mismatch in the total column of CO2, CH4 and CO

The TCCON data60 which is widely used as a reference to evaluate biases in global measurement of CO2, CH4 and CO total column averages–referred to as XCO2, XCH4 and XCO–from space16 is used here to assess the inter-hemispheric gradient, seasonal cycle and synoptic day-to-day variability in the nature run dataset (Fig. 4). The large-scale patterns of variability on a monthly scale are generally well represented for the three species. The amplitude of the XCO2 seasonal cycle is underestimated at most TCCON sites, with the summer trough being 1 to 3 ppm higher than observed. This is consistent with the general underestimation of the biogenic sink during the growing season shown in Fig. 3. XCH4 is overestimated in spring/summer and underestimated in autumn/winter, due to errors in the seasonality of the chemical sink and emissions (e.g. wetlands, agriculture and biomass burning). XCO is underestimated in winter which is a common feature in many models and emission data sets61 and overestimated in summer/autumn, often caused by the biogenic emissions of isoprene, which have a large impact on southern hemisphere and global background values62 of CO. Other sources of error are associated with the chemical sources/sinks61 and fire emissions63, as 2015 was an extreme year for CO because of Indonesian fires in autumn64. Part of the bias shown in Fig. 4 also comes from the CO2, CH4 and CO initial conditions at the start of the nature run extracted from the CAMS re-analysis24,25. The random error in the sub-monthly variability (STDE in Fig. 4) - associated with surface fluxes/emissions and atmospheric transport - is generally below 1.5 ppm for XCO2, 10 ppb for XCH4 and 10 ppb for XCO, except at urban sites near emission hotspots such as Pasadena, Tsukuba and Paris.

Fig. 4
figure 4

Evaluation of XCO2, XCH4 and XCO from the CHE nature run (NR). The nature run is compared to total column FTIR observation35,60 at the TCCON stations67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90 (OBS). The crosses indicate that the bias is statistically significant (p-value < 0.01).

Example: Fine-scale structure in vertical profiles

The vertical profiles of CO2, CH4 and CO are illustrated in Fig. 5 with a comparison to AirCore observations36,39 from the National Oceanic and Atmospheric Administration (NOAA) Global Monitoring Laboratory and the lower-resolution CAMS surface in situ inversion dataset57,65,66. While most global transport models used in atmospheric inversion systems have too coarse horizontal and vertical resolution to be able to represent the fine-scale vertical structure, the CHE nature run is able to capture the small-scale anomalies along the atmospheric column from the surface up to the lower stratosphere (50 hPa). The profiles on three different consecutive days show the large variability associated with day-to-day synoptic transport, particularly for CO2. Capturing this type of vertical variability is important because it reflects the ability of atmospheric transport models to represent vertical mixing and long-range transport. Both need to be accurately represented in atmospheric inversions in order to accurately infer surface fluxes. Examples of anticorrelation between the near-surface CO2 and XCO2 are also shown in Fig. 5j (e.g. 7, 9, 15, 20, 21 and 24 June) which are associated with the advection of anomalously high/low CO2 air in the free troposphere (above 700 hPa) and the opposite decrease/increase of CO2 near the surface. This emphasizes the importance of tracer transport above the planetary boundary layer in explaining the variability of XCO2 also shown in Fig. 2.

Fig. 5
figure 5

Examples of CO2 CH4 and CO vertical profiles from the CHE nature run at Sodankylä (67.37°N, 26.63°E). The nature run is compared to NOAA AirCore (v20201223) observations36,39 and the CAMS CO2 and CH4 inversion57,65,66 (ai) during three days in June, depicted by the dashed lines in (j) where the nature run hovmöller plot for CO2 shows the temporal variability of the vertical profile at Sodankylä over the whole month of June. The solid black and magenta lines show the time series of XCO2 and near-surface CO2 averaged over the model levels from the surface to 400 m above the surface (SFC CO2) respectively.