Chinese environmentally extended input-output database for 2017 and 2018

Environmental footprint analyses for China have gained sustained attention in the literature, which rely on quality EEIO databases based on benchmark input-output (IO) tables. The Chinese environmentally extended input-output (CEEIO) database series provide publically available EEIO databases for China for 1992, 1997, 2002, 2007, and 2012 with consistent and transparent data sources and database structure. Based on the latest benchmark IO tables for China for 2017 and 2018, here we develop the corresponding 2017 and 2018 CEEIO databases following the same method used to develop previous CEEIO databases. The 2017 and 2018 CEEIO databases cover 44 and 28 types of environmental pressures, respectively, and consider multiple sector classifications including ones consistent with previous CEEIO databases and ones following the 2017 China’s national economy industry classification standard. A notable improvement in the 2017 and 2018 CEEIO databases is the comprehensive inclusion of CO2 emissions from additional industrial processes. This work provides a consistent update of the CEEIO database and enables a wide range of timely environmental footprint analyses related to China.


Methods
Scope. We consider four broad categories, 44 types of environmental pressures generated by domestic sectors and households in China for the 2017 CEEIO database based on the available data: (1) freshwater consumption; (2) 23 types of atmospheric pollutants including carbon dioxide (CO 2 ), methane (CH 4 ), nitrous oxide (N 2 O), nitrogen oxides (NO x ), dust and soot, sulfur dioxide (SO 2 ), hazardous trace elements (HTEs, including Hg, As, Se, Pb, Cd, Cr, Ni, Sb, Mn, Co, Cu, and Zn), particulate matter with an aerodynamic diameter of 2.5 mm or less (PM 2.5 ), particulate matter with an aerodynamic diameter of 10 mm or less (PM 10 ), carbon monoxide (CO), Volatile Organic Compounds (VOCs), ammonia (NH 3 ); (3) 13 types of water pollutants including chemical oxygen demand, ammonia nitrogen compounds, phosphorus, petroleum pollutants, volatile phenols, cyanide, aquatic Hg, aquatic Cd, aquatic Cr, aquatic Pb, aquatic As, aquatic Cu, and aquatic Zn; and (4) 7 types of solid waste including industrial solid waste, plastic film, crop straws, animal manure, sludge, medical waste, and household waste. The 2018 CEEIO database covers 28 types of environmental pressures among these 44 based on the available data: (1) the fuel combustion from sectors 18 , (2) the output of industrial products 19 , (3) the yield of crop, (4) the amount of livestock and poultry, (5) the freshwater consumption for agriculture, industry, and domestic use 20 , and (6) the usage amount of plastic film 21 .

Data sources and estimation for environmental pressures.
Here we show how the 2017 CEEIO database is developed in detail, while the 2018 database is developed in similar way. Compared with the previous CEEIO database, the major improvements include (1) updated emission factors of environmental pressures including greenhouse gas, nitrogen oxides, atmospheric Hg, atmospheric As, and atmospheric Se in the combustion process of coal, natural gas, and petroleum products, (2) covered additional industrial process (e.g., the production of cement, synthetic ammonia, and coke) when accounting for the emissions of nitrogen oxides, (3) added 14 types of environmental pressures including Atmospheric Pb, Atmospheric Cd, Atmospheric Cr, Atmospheric Ni, Atmospheric Sb, Atmospheric Mn, Atmospheric Co, Atmospheric Cu, Atmospheric Zn, PM 2.5 , PM 10 , CO, VOCs, NH 3 , and (4) adjusted the sector classification according to the new national economy industry classification standard adopted since 2017. Figure 1 shows the process of developing the 2017 CEEIO database which also applies to the 2018 one. In the 2017 CEEIO database, we include the emissions of 19 types of air pollutants (CO 2 , CH 4 , N 2 O, NO x , CO, HTEs, VOCs, NH 3 ) from both the combustion of 26 types of fuel sources and industrial processes. Specifically, the 26 types of fuel sources include raw coal, washed coal, other washed coal, briquettes, crude petroleum, natural gas, coke, other coking products, gasoline, kerosene, diesel oil, fuel oil, naphtha, lubricating oil, paraffin oil, solvent oil, bitumen, petroleum coke, other petroleum products, liquefied petroleum gas, coke oven gas, blast furnace gas, converter gas, other gas, liquefied natural gas (LNG), and refinery dry gas. Industrial processes considered in this study are processes producing crude petroleum, natural gas, refined edible vegetable oil, paper, coke, sulfuric acid, caustic soda, soda ash, ethylene, ammonia, chemical fertilizer, primary plastic, synthetic rubber, cement clinker, lime, glass, pig iron, crude steel, rolled steel, copper, electrolyzed aluminum, and alumina oxide. The amounts of emissions of the 19 air pollutants are calculated using Eq. 1: where AP i j is the discharge of the j th air pollutant generated by the sector i, ε k ij is the emission factor of the j th air pollutant from the combustion of the k th type of fuel source in sector i, c k i is the total amount of the k th type of fuel consumed by the sector i, γ k j is the emission factor of the j th air pollutant for the industrial process of producing product k, p k i is the total amount of k produced by the sector i, and n is the number of products from sector i which varies across sectors. Data for ε k ij and γ k j are from the Intergovernmental Panel on Climate Change (IPCC) [22][23][24] , the Ministry of Ecology and Environment of the People's Republic of China (MEEPRC) 25,26 , and previous studies [27][28][29][30][31][32][33] . Besides, the sector of Electricity & Heat Production and Supply also consumes fuel sources in their energy conversion process. Therefore, we consider the air pollutants generated from the intermediate energy conversion process accordingly. Moreover, we update the NOx emissions of each sector based on the China's national NOx emissions inventory in 2017 from the Communique of China's Second National Survey of Pollution Sources (CCSNSPS) 34 .
The amounts of PM 2.5 and PM 10 emissions are calculated using Eq. 2: where r k ij is the average removal rate of the j th air pollutant (PM 2.5 or PM 10 ) from the combustion of the k th fuel source in sector i, and s k j is the average removal rate of the j th air pollutant in the industrial process of producing product k. Data for ε k ij , γ k j , r k ij , and s k j are from the MEEPRC 35,36 and Bai et al. 31 . Similarly, we also consider the PM 2.5 and PM 10  where tp i j is the amount of the j th environmental pressure from sector i after correction, e i j is the total intensity of the j th environmental pressure in sector i, x i is the gross output of sector i at constant price based on 2012, and TOP j is the total emission of the j th environmental pressure in 2017.
The NBSC provides the total amount of freshwater consumed for agriculture, industry, and other use (including construction, tertiary, and households) in 2017 and 2018 20,37 . We estimate the freshwater consumption of Crop Cultivation, Forestry, Livestock and Livestock Products, Fishery, and Technical Services for Agriculture using Eq. 4:  www.nature.com/scientificdata www.nature.com/scientificdata/ where w i is the amount of freshwater consumed by sector i, z w i is the intermediate input from the sector of Water Production and Supply to sector i, and W A is the total freshwater consumption of the agriculture sector. The freshwater consumption of mining, manufacturing, and service sectors as well as urban and rural households are obtained by multiplying the output at constant price based on 2012 by the freshwater consumption by unitary output of each sector from the 2012 CEEIO database. The result is scaled to match the aggregate freshwater consumption for these sectors in 2017 and 2018, respectively.
The CCSNSPS provides the amounts of sludge and medical wastes in 2017 25 . The emissions of the two solid wastes in 2018 are estimated by multiplying the total intensity with the output at constant price based on 2017 of Water Production and Supply and Health Services. We estimate the rural household waste by multiplying the rural population with the generation factor (0.86 kg/capital*day) 38 of household waste in rural areas in China, while the NBSC provides the amount of urban household waste 20,37 . We use Eqs. 5, 6, and 7 to estimate the amounts of plastic film, crop straws, and animal manure respectively: In Eq. 5, pf, u, w, and r represent the amount of plastic film waste, the amount of plastic film used in farmlands 34 , scrap rate (0.58), and emission rate (0.20) of plastic film, respectively 39 . In Eq. 6, cs, c i , f i , and d i represent the amount of crop straws, the yield of crop i, the generation rate, and emission rate of crop straws of crop i, respectively 40,41 . In Eq. 7, am is the amount of manure; n i is the number of livestock and poultry stored at the end of the year (the breeding cycle of pigs is generally 199 days according to the NBSC, so the number of pigs raised is calculated according to the amount of slaughters fattened hogs) 37 ; and m i and l i respectively represent the generation rate and emission rate of manure of animal i. Sector classification. The conversions of difference sector classifications follow Eq. 8 to Eq. 12:  Figure 2 shows the share of the sectoral share of the 28 environmental pressures. We find that: (1) more than half (64.0%) of the freshwater consumption comes from the agricultural sector; (2) manufacturing is the major source (85.6%) of air pollutants; and (3) water pollutants mainly (84.3%) come from households, agriculture, and other services.

Technical Validation
Uncertainties. There are three major sources of uncertainty in our data.
(1) According to the IPCC 23 and relevant studies [46][47][48] , there are uncertainties in emission factors of GHGs and HTEs for energy combustion and industrial processes, as well as the amount of industrial products. For instance, Supplementary Table 4 shows the uncertainties in CO 2 emission factors for fuel combustion in energy industry 16,17 . Table 1 shows the coefficient of variation of the data used to estimate HTEs emissions. (2) The data on national energy consumption published by the NBSC are not consistent with the sum of energy consumption published by 30 province. This means that, due to different statistical caliber and systematic deviation in the statistical process, there are uncertainties in the energy consumption data of each sector. (3) When extending the 49-sector CEEIO database to the 149-sector CEEIO database for 2017 (153 sectors for 2018), and merging the 149-sector CEEIO database into 45-sector, 91-sector, and 96-sector CEEIO database, we assume that the intensities of environmental pressures of the sub-sectors split from the aggregate sectors are the same as those in the aggregate sectors. This treatment cannot take the differences among the sub-sectors into account.
It is difficult to quantify the uncertainty caused by merging and splitting sectors as there are not enough data to measure the difference between sub-sectors.The Monte Carlo simulation has been applied to quantify the variations of emission inventory due to the uncertainties in emission factors and activity data. All variables are assumed to follow a normal distribution in our analysis. Based on the coefficient of variation collected from relevant research, 100000 random samples of the variables have been generated to estimate the range and distribution of emissions of 13 types of environmental pressures. Figure 4 shows the 97.5% confidence intervals of the emissions of 13 environmental pressures. The uncertainty of Atmospheric Se emissions turns to be the highest (−20.2%~25.8%), while the Atmospheric Hg has a relatively lower uncertainty (−3.4%~3.8%). www.nature.com/scientificdata www.nature.com/scientificdata/ Comparisons with existing emission datasets. Figure 5 shows sectoral CO 2 emissions of China in 2017 in this study and Shan et al. 16,17 . The total CO 2 emissions in China based on this paper are 2.8 billion tons (29.65%) more than those in Shan et al. . The difference mainly comes from the fact that, besides cement clinker, we consider CO 2 emissions from additional industrial processes, such as processes of making lime, glass, ammonia, soda ash, ethylene, coke, pig iron, crude steel, and primary aluminum (2,557 million tons), while Shan et al. did not.

Coal combustion sources
Coal consumption power plant 5% Industrial sectors 5%

Non-coal combustion sources
Liquid fuel combustion Liquid fuel consumption 5%

Emission factors 25%
Nonferrous metal smelting Nonferrous metal production 5% www.nature.com/scientificdata www.nature.com/scientificdata/ We also consider emissions from the consumption of liquefied natural gas (156 million tons) while Shan et al. did not. Figure 6 shows the Kendall correlation analysis of the CO 2 emissions in the two databases, which shows that the results of the two databases are highly correlated. These data are subject to high uncertainty and will be updated after the publication of the 2018 Annual Report on China's Environmental Statistics. (2) The IPCC published GHG emission factors for industrial processes of products such as nitric acid, methanol, ferroalloys, raw magnesium, lead, zinc. But the amounts of these industrial products are not currently available. Future updates will seek for additional data to estimate GHG emissions from these processes. (3) Household biomass combustion, waste disposal, and fuel combustion from private vehicles all contribute to environmental pressures. However, the relevant consumption data are not readily available. In the future, we will take these emissions into account when relevant data become available.

Code availability
The MATLAB Code used to merge the 149 sectors into 45 sectors, 49 sectors, 91 sectors, and 96 sectors is shown below for transparency and verifiability. We take merging 149 sectors into 96 sectors as an example.