China CO2 emission accounts 1997–2015

China is the world’s top energy consumer and CO2 emitter, accounting for 30% of global emissions. Compiling an accurate accounting of China’s CO2 emissions is the first step in implementing reduction policies. However, no annual, officially published emissions data exist for China. The current emissions estimated by academic institutes and scholars exhibit great discrepancies. The gap between the different emissions estimates is approximately equal to the total emissions of the Russian Federation (the 4th highest emitter globally) in 2011. In this study, we constructed the time-series of CO2 emission inventories for China and its 30 provinces. We followed the Intergovernmental Panel on Climate Change (IPCC) emissions accounting method with a territorial administrative scope. The inventories include energy-related emissions (17 fossil fuels in 47 sectors) and process-related emissions (cement production). The first version of our dataset presents emission inventories from 1997 to 2015. We will update the dataset annually. The uniformly formatted emission inventories provide data support for further emission-related research as well as emissions reduction policy-making in China.


Methods
The CO 2 emissions in this dataset were estimated in terms of the IPCC administrative territorial-based accounting scope. The administrative territorial emissions refer to emissions 'taking place within national (including administered) territories and offshore areas over which the country has jurisdiction (page overview.5)' 18 . The territorial-based emissions do not include emissions from international aviation or shipping 19 . The administrative territorial emissions can be used to evaluate the human-induced emissions by domestic production and resident activities directly within one region's boundaries 20,21 . Our CO 2 emission inventories were constructed in two parts: energy-and process-related (cement) CO 2 emissions. The energy-related emissions can be calculated using two approaches: the sectoral and reference approaches. Figure 1 presents a diagram of the entire construction of our emission inventories.

Energy-related sectoral approach emissions
The energy-related emissions refer to the CO 2 emitted during fossil fuel combustion. According to the IPCC guidelines 22 , the sectoral approach emissions are calculated based on the fossil fuels' sectoral combustion; see equation (1) below.
where CE ij refers to the CO 2 emissions from fossil fuel i burned in sector j; AD ij represents the fossil fuel consumption by the corresponding fossil fuel types and sectors; NCV i refers to the net caloric value, which is the heat value produced per physical unit of fossil fuel combustion; CC i (carbon content) is the CO 2 emissions per net caloric value produced by fossil fuel i; and O ij is the oxygenation efficiency, which refers to the oxidation ratio during fossil fuel combustion. The subscripts i (fossil fuel) and j (sector) correspond to those used in Table 1 and China energy statistical yearbook. We merged these fuels into 17 types due to the small consumption and similar quality of certain fuels to that of others, as shown in Table 1. Among the 17 fossil fuels, raw coal, crude oil, and natural gas are primary energy sources. The remaining 14 fuels are classified as secondary energy sources, which are extracted or processed from primary sources. The 47 sectors used in the energy statistical system are also consistent with those used in China's national economic accounting 23 (see Table 2). Due to all the administrative boundaries (at both the national and provincial scales) that span both urban and rural geographies in China, urban and rural households are listed separately in the multiscale CO 2 emission inventories.
Fossil fuels used as chemical raw materials ('non-energy use' in the Energy Balance Table), as well as the energy loss during transportation, were removed from the total fossil fuel consumption to avoid double counting. The non-burning fossil fuels input during energy conversion processes was also excluded as the processes involve little CO 2 emissions. Taking the process of coal washing as an example, the carbon elements in raw coal are converted into cleaned coal and other washed coal during the process. The real CO 2 emissions concentrated in the combustion of cleaned coal and other washed coal. Other similar processes include 'coking', 'petroleum refineries', 'gas works', 'briquettes'. Only fossil fuels burnt during the transformation processes were taken into account for emission calculation, i.e., 'thermal power' and 'heating supply'.
Emissions from electricity/heat generated within city boundaries were counted based on the energy input for power/heat generation ('thermal power' and 'heating supply') and were allocated to the electricity generation sector 24 . Our administrative territorial emission inventories excluded emissions from imported electricity and heat consumption from outside the nation/one province boundaries. We only focused on fossil fuels consumed within the nation/one province boundary.
The national sectoral fossil fuel consumption (AD ij ) was collected from the Energy Statistical Yearbooks published officially by the National Bureau of Statistics of China 25 . China has officially revised its national energy statistics four times since 2000 (in 2004, 2005, 2009, and 2014's China energy statistical yearbooks). Each revision has modified the energy balance sheets and sectoral energy consumption. For example, the total energy consumption of 2011 are modified from 3,480 to 3,870million tonnes of standard coal equivalent (in coal equivalent calculation) in 2014's revision, enlarged by 11.2%. Our emission inventories were calculated based on the most up to date energy data published after 2014 25 .
For the provincial scale, the China Energy Statistical Yearbooks only publish each province's energy balance table every year. We collected the total consumption of the 17 fossil fuels from the balance table and then used the provinces' sectoral fossil fuel consumption to divide the total consumption. Most of the provinces' sectoral fossil fuel consumption was collected from the provinces' corresponding statistical yearbooks. For certain provinces (Hebei, Jiangsu, Zhejiang, Shandong, Guangxi, Hainan, Sichuan, and Guizhou) that do not have the data in their yearbooks, we used the national economic census data from 2008 26 , which assumes the industry structure was stable during the intervening years.
Both the IPCC and National Development and Reform Commission of China (NDRC) have published default factors (NCV i , CC i ) for China. Most of the current research uses the IPCC default value. According to our previous survey on China's fossil fuel quality and cement process 10 , the IPCC default emission factors are approximately 40% higher than China's survey value. In our datasets, we used the updated emission factors, see Table 1. As our previous study only reported the emission factors of three primary fossil fuels (i.e., raw coal, crude oil, and natural gas), we estimated the emissions factors of other 14 secondary fossil fuels by scaling them down according to the ratio of the updated primary fossil fuels' emission factors to those of NDRC. We used the ratio of raw coal, crude oil to update emission factors of coal-related, oil-related fuels, respectively. For O ij , our datasets adopted different oxygenation efficiencies for the fossil fuels used in different sectors 27 , which represents the different combustion technology levels of the sectors (shown in Table 3 (available online only)).
We used MATLAB R2014a to construct the emission inventories with sectoral fossil fuel consumption and emission factors. We provided the code in the Supplementary Information. We also provided the formatted energy data of China and its provinces (energy inventories) in our datasets for additional data transparency and verifiability (see Data Citation 1, File 'China national energy inventory, 2000-2015' and File 'China provincial energy inventory, 1997-2015'). Researchers will be able to use the MATLAB code and energy inventories to recalculate the CO 2 emissions for China by adopting different emission factors.

Energy-related reference approach emissions
Apart from the sectoral approach, the energy-related emissions of one region can also be estimated using the reference approach. 'The Reference Approach is a top-down approach, using a country's energy supply data to calculate the emissions of CO 2 from combustion of mainly fossil fuels. The Reference Approach is a straightforward method that can be applied on the basis of relatively easily available energy supply statistics (Volume 2, Chapter 6, Page 5)' 22 . The IPCC suggests 'to apply both a sectoral approach and the reference approach to estimate a country's CO 2 emissions from fuel combustion and to compare the results of these  two independent estimates (Volume 2, Chapter6, Page 5)' 22 . The reference emissions can be used to verify and support the sectoral emissions.
As the reference emissions were calculated from the fossil fuels' production base, we only considered three primary fossil fuels (raw coal, crude oil, and natural gas). With the assumption of carbon balance, the carbon in the supply of the 3 primary fossil fuels should be equal to the carbon contained in the total consumption of the 17 fossil fuels 9 . We calculated the reference approach emissions as in equation (2): where CE ref − i refers to the reference CO 2 emissions from fossil fuel i, EF i and AD ref − i are the emission factors and apparent consumption of the corresponding fossil fuel, respectively. The emission factors for the 3 primary fossil fuels are the same as those used in the sectoral approach emissions calculation 10 . Values of AD ref − i were calculated as in equation (3). For the same reason, we removed the non-energy use and loss parts from the fuel's apparent consumption. The items in bracket were only used to calculate the apparent consumption of provinces and were skipped when calculating the national consumption.
All the items in equation (3) (at both the national and provincial scales) were collected from the most up to date energy balance tables published officially in the China Energy Statistical Yearbooks 25 .

Process-related (cement) CO 2 emissions
The process-related emissions refer to CO 2 emitted as a result of physical-chemical reactions in the production process and not the energy combusted by the industry 28 . 'The fossil fuels used in this transformation stage are considered the carbon emissions from fossil fuel combustion performed by the industrial sectors and are not considered as the industrial process emissions (page 240)' 29 . In this study, we only investigated cement production, which accounts for approximately 75% of China's total processrelated CO 2 emissions 7 . We calculated the cement-related CO 2 emissions as in equation (4): where CE t refers to the process-related CO 2 emissions from cement production and AD t is the activity data for cement-related emissions accounting, which refer to cement production. We collected data for the cement productions of China and its provinces from the official dataset of the National Bureau of Statistics 30 , which are consistent with the China Statistical Yearbooks 31 . The expression EF t refers to the emission factor for cement production, which is 0.2906, also collected from Liu, et al. 10 . The cement-

Comparison of the sectoral-and reference-approach emission inventories
The difference between the sectoral-and reference-approach emission inventories laid in the way we calculated the fossil fuel consumptions when estimated the energy-related emissions. The process-related emissions from the two approaches were exactly the same. The sectoral emissions were calculated from the energy consumption aspect while the reference emissions were calculated via the energy production and trade data. The reference approach assumed that all the carbon elements from the primary energy sources (excluding the transport loss and non-energy usage part) were converted into CO 2 emissions. IPCC suggest calculating the reference emissions for one country as a validation of the sectoral emissions. Therefore, we calculated both the sectoral and reference emission for China and its provinces in our datasets. The red lines in Fig. 2 compared the sectoral and reference emissions. Our reference emissions were 1 to 7% higher than the sectoral emissions. The differences between the two approaches can be explained from three aspects. First, the energy loss during energy transformation process was not excluded from the reference energy consumption. Second, only transport loss and nonenergy usage of primary energy sources were excluded from the total consumption in the reference approach. Those of secondary energy sources were not removed. Third, there was roughly 1.2% statistical difference between the energy production and consumption data in China's energy balance table 12 .
As discussed in the energy-related reference approach emissions section above, the reference emissions were calculated with the data of primary fossil fuels only, while the emissions embodied in the secondary fossil fuels cannot be reflected. Due to the frequent energy trade among Chinese provinces, especially the secondary energy types, the provincial reference emissions cannot reflect the real CO 2 emissions within one provincial boundary. Considering the data completeness and transparency, we provided the provincial reference emission inventories in our datasets as well for reference.

Data Records
A total of 1,172 data records (emission and energy inventories) are contained in the datasets. Of these,       Our CO 2 emission inventories were constructed in a uniform format. The sectoral approach emission inventories are matrices with 19 columns and 47 rows, as shown in Table 4 (available online only) (an example of the China CO 2 emission inventory, 2015). The 19 columns are 17 fossil fuel-related emissions, cement-related emissions and total emissions. The 47 rows represent the 47 socioeconomic sectors. Each element of the matrices represents the CO 2 emissions from fossil fuel combustion/cement production in the corresponding sector. The sectoral and reference approach inventories include emissions from every individual item (e.g., production and import) of the three primary energy sources and the cement process. As an example, Table 5 presents the sectoral and reference approach emission inventories for China from 2000 to 2015.

Technical Validation Uncertainty analysis
Uncertainty analyses are an important tool for improving emission inventories with uncertainty, which are an essential element of a greenhouse gas emissions inventory. Considering the small amounts and low uncertainties of the process-related emissions in cement production 10,32 , we only calculated the uncertainties from energy-related emissions in this study. The uncertainties of inventory are caused by many reasons, as the energy-related CO 2 emissions were calculated as fossil fuel consumption (activity data) multiplied by the emission factors, the uncertainties should be 'derived for the component parts such as emission factors, activity data and other estimation parameters (Volume 1, Chapter 3, Page 6)' 22 . We quantified both the uncertainties of emission factors and fossil fuel consumption data for our datasets.
As introduced above in the Methods section, this study adopted the emission factors from Liu, et al. 10     in Table 8 (available online only)). It is found that the fuels' net caloric values varied a larger range than those of carbon content and oxygenation efficiency. Taking raw coal as an example, the Coefficient of Variation (CV, the standard deviation divided by the mean) of raw coal's net caloric value is 15%, while the CVs of carbon content and oxygenation efficiency are 2 and 4% respectively. The CV of raw coal's comprehensive emission factor (NCV i × CC i × O i ) is 18%. The emission factor of coal-related fuels varied in a wider range than those of oil-related fuels and the natural gas. The average CV of coal-related fuels is 18%, while that for the oil-related fuels and natural gas is 4 and 5% respectively. Among the emission factors from eight sources, the IPCC and UN-average have the highest values, while Liu et al.'s study (used in this study), MEIC and NC1994 have the lowest values. Due to the poor quality of China's fossil fuel data, the fossil fuel consumption data also have large uncertainties. According to the previous literature, the fossil fuel consumed in electricity generation sector had a CV of 5% 37,38 , while the fossil fuel consumed in other industry and construction sector had a CV of 10% 22,39 . The CV of fossil fuel consumed in the transportation sector was 16% 40 , while residential and primary industry fossil fuel usage even had higher CVs of 20% 22 and 30% 41 respectively. The uncertainties in China's fossil fuel data has been addressed and discussed by Guan, et al. 11 previously. Possible reasons include the opaqueness in China's statistical systems, especially on the 'statistical approach on data collection, reporting and validation (Page 673)' 11 ; and the dependence of China's statistics departments with other government departments. As a result, China's national fossil fuel consumption is smaller than the provincial aggregated data. Despite that China enlarged its 2000-2013  national energy data in 2014, there was still roughly 5% gap between the latest national and provincial aggregated energy data. We employed the Monte Carlo simulations to propagate the uncertainties induced by both fossil fuel consumption and emission factors to provide the uncertainty estimates for entire emission inventories 22 . According to the Monte Carlo technique, we first assumed normal distributions (probability density functions) for both activity data (fossil fuel consumption) and emission factors with CVs discussed above 10,32 . Random sampling on both the activity data and emission factors were then conducted for 100,000 times and generated 100,000 estimations on the CO 2 emissions. The uncertainty range, therefore, was 97.5% confidential intervals of the estimations. The above simulation was conducted in MATLAB R2014a.
We found that the uncertainties of the entire CO 2 emissions inventories were roughly (−15%, 25%) at a 97.5% confidential level. Table 9 (available online only) and Fig. 4 show the uncertainties in the national emission inventories from 2000 to 2015. The above ranges, e.g., (−15%, 25%), reflected the uncertainties from both emission factor and activity data. In particular, concerning the continuous debate on the emission factor of fossil fuel combustion in China 33-36 , we incorporated 8 emission factors from independent sources to represent the uncertainty of emission factors. In order to separate the uncertainty induced by emission factor and activity data, we then conducted the Monte Carlo simulations by assuming the CV of one of them was 0. The results showed that uncertainties from the emission factors in 2015 were (−15.8%, 23.7%), while the uncertainties from the activity data were (−1.4%, 9.2%). This implied the emission factors of fossil fuels induced higher uncertainty to the final estimation.
In In addition, the emissions calculated based the provincial aggregated energy data are about 5% higher than that based on national data due to the difference in the national and provincial data.
In addition to the uncertainties of emission factors and fossil fuel data considered in the Monte Carlo techniques above, there were some other uncertainties that should be taken into consideration when using the datasets, such as 'lack of completeness', 'lack of data', 'measurement error'. These uncertainties were very small and difficult to quantify; however, they were also essential parts of the inventories' uncertainties. 1) Lack of completeness: We only considered the energy-related emissions and cementrelated emissions in our datasets. Emissions from other sources were not taken into account, such as 'agriculture', 'land-use change and forestry', 'waste', and other industrial processes. 2) Lake of data: As discussed above, the sectoral fossil fuel consumption of 8 provinces were lacking. We used the sectoral fossil fuel consumption structure in 2008 to estimate that of the intervening years. Such a replacement had no much effect on the total emissions, but increased the uncertainties in provincial sectoral emissions. Also, the emission factors for secondary fossil fuels were estimated based on the primary fossil fuel emission factors' ratio. 3) Measurement error: the 'measurement error is random or systematic, results from errors in measuring, recording and transmitting information; inexact values of constants and other parameters obtained from external sources (Volume 1, Chapter 3, Page 11)' 22 . The measurement errors might be generated in the energy statistics and emission factors' calculation.

Comparison with existing emission estimates
We compared our emissions with estimates from other research institutes, shown in Fig. 2. We found that our national sectoral emissions were the lowest among the estimates. The Global Carbon Budget (GCB) had the highest value until EDGAR passed it since 2012. Our national sectoral emissions were 9 to 18% lower than the highest value. This was mainly because that we used the updated emission factors, which were lower than the IPCC default value. Our results were 1-3% higher than BP and MEIC's since 2013. Even considering the emissions from BP and MEIC not including the cement-related emissions, they had closer results with our datasets compared with other emission estimations. Our estimates were highly consistent with the newly published official emission inventory. The Chinese government published the 'First Biennial Update Report on Climate Change 8 ' by the end of 2016. In the report, the energy-related CO 2 emissions in 2012 were 8,688 million tonnes (the blue points in Fig. 2), only 2.79% higher than our estimates (national sectoral emissions, 8,446 million tonnes). This tiny difference falls into the uncertainty range of the both inventories.
From the aspect of format, the existing emission estimates only present the total energy-related emissions of the whole country, or emissions from three fossil fuel categories at most (solid, liquid, and gas). Our datasets provided the energy-related CO 2 emissions from 47 socioeconomic sectors and 17 fossil fuels to give detailed demonstrations of China's emission statue as well as its provinces. Thus, our datasets can be a more detailed supplement to the existing emission estimates and the official emission inventories.

Limitations
Our datasets have the following limitations: 1) We used the national average emission factors of fossil fuels and cement production when calculating the provincial CO 2 emissions in the current version. The emission factors should be different in different regions considering the discrepancy in energy quality and cement production technology. In the future research, we will specify the emission factor of each province to achieve more accurate emission inventories for provinces; 2) In the current version, we used the sectoral fossil fuel consumption structure in 2008 to estimate that of the intervening years for 8 certain provinces. In the future, we will investigate the 8 provinces for more accurate data. 3) We only considered emissions from cement production in the current process-related emissions accounts. The latest official emission inventory in 2012 include other 9 processes such as glass, lime, steel production. In the future research, we will extend the scope of our datasets to include more industrial processes.