Background & Summary

China has become the top power producer globally, and it had the largest share (19.5–26.7%) of global power generation from 2010 to 20181. The majority (70.4–82.5% during 2010–2018) of China’s power generation came from thermal power plants that combusted coal, oil plus natural gas, biomass or other fossil energy (accounting for 60.2–73.4% of the total capacity)2. Accompanying with large amounts of fossil energy combustion, China’s thermal power plants have become major sources of air pollutants, emitting 5.0–23.5%, 15.7–38.7% and 19.1–51.5% of China’s anthropogenic particulate matter (PM, defined as microscopic solid or liquid matter suspended in the atmosphere)3,4,5,6, SO24,5,6,7,8,9,10 and NOX4,5,6,7,8,9,10,11, respectively, from 2010 to 2017. These air pollutants (representing 5.9%, 23.1% and 21.5% of China’s anthropogenic PM, SO2 and NOX emissions, respectively, in 20155), through a series of physical processes and chemical reactions in the atmosphere12, contributed to 7.6% of China’s population-weighted PM2.5 (PM with an aerodynamic diameter of or below 2.5 μm) concentration as of 201513,14, leading to severe haze events and human health damage nationwide.

To control power emissions, an emission inventory at high spatiotemporal resolutions is needed as the foundation for an analysis of power emission characteristics and specific policy designs15. There are some detailed (unit- or plant-level) inventories on air pollutant emissions from China’s power plants, such as the Global Power Emissions Database (GPED)15, the China Coal-fired Power Plant Emissions Database (CPED)16 and other bottom-up databases17,18,19,20. However, due to the lack of systematic real measurements, existing datasets rely on average emission factors (defined as the amount of air pollutants per unit of power generation or fuel consumption) and are subject to the following three limitations. First, average emission factors are not the results of measurements; rather, these values are proxies of broad technology classes, and the values are dependent on many assumptions and indirect parameters (e.g., pollutant contents, oxidation rates, net heating values and control technology efficiencies), which cause high uncertainty21. Second, average emission factors are at a quite aggregated level where they are fixed to uniform and invariable values, thereby failing to reflect the heterogeneous and time-varying features of individual power plants. Third, the available emission factors were estimated before 2012; however, China has carried out a series of mitigation measures that have brought great renovations and technological changes to Chinese power plants, such as the GB13223-2011 emissions standards implemented in 201222 and the ultra-low emissions (ULE) standards promulgated in 201423. Therefore, introducing direct measurements rather than using indirect average emission factors provides a promising method to reduce the uncertainty in the existing inventories.

This study is the first to develop an inventory for China’s power emissions based on systematic actual measurements15,16,17,18,19,20, which is named the China Emissions Accounts for Power Plants (CEAP). We introduce the data from China’s continuous emission monitoring systems (CEMS) network, i.e., the actual measurements for nationwide, real-time stack concentrations of PM, SO2 and NOX from most of the thermal power plants in China (representing 96–98% of the total thermal capacity)24. The introduction of CEMS data can effectively address the three limitations in existing inventories that use the average emission factors in the following ways15,16,17,18,19,20. First, the CEMS-based estimation for emission factors and absolute emissions based on real emission data greatly reduces the uncertainty associated with average emission factors that are reliant on many assumptions and uncertain parameters15. Second, the source-level, real-time CEMS data provide rich information and improve the spatiotemporal resolutions, reflecting the heterogeneous and time-varying characteristics of power emissions16. Third, the CEMS data for the period of 2014–2017 are collected, and the emission factors for China’s power plants are updated, enabling an ex post analysis on the mitigation effects of recent clean air policies, such as GB13223-2011 and ULE standards22,23.

This CEAP dataset provides nationwide, detailed, dynamic PM, SO2 and NOX emissions from China’s power plant for 2014–2017 based on comprehensive, source-level, real-time data from China’s CEMS network. In addition, the CEAP dataset encompasses rich information regarding fuel use, operating capacity, geographic distribution, etc. for each power plant. The CEAP dataset has already been employed to conduct an ex post analysis on the efficacy of the ULE policy in mitigating China’s power emissions24 and will facilitate further research on the environmental improvements and health benefits associated with the mitigation effect14,24,25, serve policymakers to design future clean air policies14, and provide implications for other countries seeking to understand and regulate their power emissions.

Methods

Scopes and databases

The CEAP dataset comprises all the thermal power plants operating in China, totalling 2,714 plants (or 6,267 units), from 2014 to 2017 in 26 provinces and 4 municipalities (except Hong Kong, Macao, Taiwan and Tibet; Table 1). The thermal power plants produce electricity by combusting a variety of fossil energies, which fall into 4 categories: coal, gas plus oil, biomass and others (detailed in Table 2).

Table 1 China’s thermal power plants in CEAP.
Table 2 Fuel type descriptions.

The CEAP dataset integrates two databases, i.e., the CEMS data and unit-specific information. The CEMS data—the direct, real-time measurements of stack gas concentrations of PM, SO2 and NOX from China’s power plant stacks—are monitored by China’s CEMS network and reported to the China Ministry of Ecology and Environment (MEE; http://www.envsc.cn/). The CEMS data are recorded on a source and hourly basis. In total, the CEMS dataset covers 4,622 emission sources (i.e., power plant stacks) associated with 5,606 units (accounting for 98% of China’s thermal power capacity), 35,064 hours from 2014 to 2017, and 3 air pollutants (i.e., PM, SO2 and NOX) for each source-hour sample (Table 3). The MEE has also provided stack-specific information (regarding latitude and longitude, heights, temperature, diameter, etc.; http://permit.mee.gov.cn/).

Table 3 CEMS coverage of China’s thermal power plant units or stacks in CEAP.

Unit-specific information is also derived from the MEE, involving activity levels (energy consumption and power generation), operating capacities, geographic allocations and pollution control equipment (particularly the types and removal efficiencies) at a yearly frequency. Due to data availability, the unit information is available only until 2016, and the activity levels for 2017 are projected following the overall trends in provincial thermal power generation between 2016 and 2017 (which are available in the China Energy Statistical Yearbooks26), under the assumption that new units constructed in 2017 have the same structures of installed capacities, energy uses and regions as those of the existing units in 2016.

With a combination of the two datasets, the CEAP dataset provides nationwide, plant-level, dynamic PM, SO2 and NOX emissions from China’s thermal power plants from 2014 to 2017. Relative to existing inventories, the CEAP dataset is innovative in that it incorporates comprehensive real CEMS-measured emission data, avoiding the use of average emission factors and the associated operational assumptions and uncertain parameters.

Pre-processing of CEMS data

We have been exclusively granted access to the data from China’s CEMS network. Generally, the CEMS consists of a sampling system (for filtering and sampling flue gas), an online analytical component (for monitoring flue gas parameters, particularly emission concentrations) and a data processing system (for collecting, processing and reporting monitoring data)27,28. According to the GB13223-2003 regulation29, the CEMS network should cover all power plant furnaces that burn coal (except stoker and spreader stoker) and oil and generate >65 tons of steam each hour, as well as those that burn pulverized coal and gas. Thus, some power plants have not yet been incorporated into the CEMS network (accounting for 3–4% of the total thermal power capacity from 2014 to 2017) because their furnaces did not meet the requirements necessary to install a CEMS. For the power plants outside the CEMS network, we assume their stack concentrations are similar to the averages of the units with similar fuel types and similar regions within the CEMS network.

To guarantee the reliability of CEMS data, China’s government has made great efforts in developing specific regulations and technical guidelines for power plants and local entities to follow and supervise, respectively24,28,30,31,32. These official documents elaborate on all the processes required to regulate the CEMS network, including not only CEMS installation, operation, inspection, maintenance and repair but also CEMS data collection, processing, reporting, analysis and storage28,32,33. Since 2014, all state-monitored companies have been mandated to report their CEMS data to the local governments through a series of online platforms for different provinces (listed in Supplementary Table 1). Local entities have random onsite inspections to check the truthfulness of the reported results on at least a quarterly basis23,24,28,32,34; this system enables a comparison of CEMS data across different firms to explore potential outliers and abnormalities and prevent data manipulation28,35. Then, the governments release the inspection results to the public through the same online platforms (listed in Supplementary Table 1)24,36,37. Severe financial penalties and criminal punishments can be imposed on firms that adopt data manipulation (in terms of deleting, distorting and forging CEMS data, for example)38,39.

The malfunction of CEMS monitors may also introduce large uncertainty to CEMS data during the processes of operation (indication errors, span drift, zero drift, etc.), maintenance (particularly the failure to perform calibration and reference tests) and data reporting (invalid data communication, data missing, etc.)24,28. Accordingly, each power plant is required to make at least one A-, B- and C-grade overhaul for 32–80, 14–50 and 9–30 days per 4–6, 2–3 and 1 year(s), respectively, as well as one D-grade overhaul (if needed) for 5–15 days per year, to check, maintain and upgrade its technologies, thereby reducing measurement uncertainty40. During these overhauls, CEMS operators conduct CEMS calibration (i.e., zero and span calibration), maintenance procedures (e.g., examining and cleaning major CEMS components and replacing or upgrading parts, if necessary, such as optical lens, filter and sampling meter) and a reference test (i.e., relative accuracy test audit). Furthermore, third-party operators examine CEMS operation and maintenance routines, to guarantee standardized CEMS operation and facilitate improvement in CEMS data accuracy27,28,31. All the related activities should be documented according to standardized requirement contents27,28. Even with the aforementioned efforts, there is still a small proportion of nulls and outliers in the CEMS database, which represent 1% and 0.1% of the total operating hours, respectively, from 2014 to 2017. We treat these samples seriously by following the relevant official documents, which have been released by China’s government. Table 4 provides the treatment methods for nulls or zeros, which can be divided into 3 types based on duration. On the one hand, we consider nulls and/or zeros that span at least 5 successive days as a downtime or overhaul and omit them in the estimation, according to the regulation27. On the other hand, missing data lasting < 5 day(s) are treated as outliers (i.e., impossible values in operation) and processed in two different ways: the nulls and/or zeros successive for > 24 hours are assumed around the valid values near the time and set to the monthly averages27:

$${\hat{C}}_{s,i,y,m,h}={\bar{C}}_{s,i,y,m,\bullet }$$
(1)

where \({C}_{s,i,y,m,h}\) denotes the stack gas concentrations of pollutant s emitted by unit i for year y, month m and hour h (i.e., the actual measurements monitored by the CEMS network), defined as the amount of pollutants per unit of emitted stack gas (g m−3)41,42; \({\widehat{C}}_{s,i,y,m,h}\) is the imputation for the missing data \({C}_{s,i,y,m,h}\); \({\bar{C}}_{s,i,y,m,.}\) is the mean of the hourly valid values for the same pollutant, unit, year and month as \({C}_{s,i,y,m,h}\). In contrast, the missing data for 1–24 hour(s) are interpolated with the arithmetic averages of the two nearest valid points before and after them27,43:

$${\widehat{C}}_{s,i,y,m,h}=\frac{{C}_{s,i,y,m,h-l}+{C}_{s,i,y,m,h+q}}{2}$$
(2)

where \({C}_{s,i,y,m,h-l}\) and \({C}_{s,i,y,m,h{\rm{+}}q}\) represent the nearest last known measurements (l hour(s) before) and next known measurements (q hour(s) after), respectively, for the missing data \({C}_{s,i,y,m,h}\), namely, the series data \({C}_{s,i,y,m,h-l+1}\),…, \({C}_{s,i,y,m,h}\),…, \({C}_{s,i,y,m,h+q-1}\) are all missing values. Furthermore, we treat the measurements that are out of the measurement ranges of CEMS instruments (outside of which the data are unreliable30,44; detailed in Supplementary Table 2) as abnormal data and process them in a similar way to nulls according to the official regulation27.

Table 4 Treatment methods for nulls and the relevant official documents.

CEMS-based estimation of emission factors and absolute emissions

The introduction of real CEMS-monitored measurements provides a direct estimation for emission factors on a source and hourly basis, avoiding the use of average emission factors with many assumptions and uncertain parameters17,42,44.

$$E{F}_{s,i,y,m,h}={C}_{s,i,y,m,h}{V}_{i,y}$$
(3)

In Eq. (3), \(E{F}_{s,i,y,m,h}\) indicates the emission factor, defined as the amount of emissions per unit of fuel use (in g kg−1 for solid or liquid fuel and in g m−3 for gas fuel), and \({V}_{i,y}\) is the theoretical flue gas rate, defined as the expected volume of flue gas per unit of fuel use under standard production conditions (m3 kg−1 for solid or liquid fuel and m3 m−3 for gas fuel)42, which was estimated by the China Pollution Source Census (2011)45 based on sufficient field measurements (detailed in Table 5). Based on Eq. (3), abated emission factors can be directly obtained even without the use of removal efficiencies and the relevant parameters, because CEMS monitors the gas concentrations at stacks after the effect of control equipment (if any).

Table 5 Theoretical flue gas rate.

Notably, recent clean air policies (particularly different emissions standards) target stack concentrations, such that a large proportion of missing data exist regarding other measurements (particularly flue gas rates, with missing data accounting for 34.62%, 31.91%, 29.97% and 42.96% of the total samples in 2014, 2015, 2016 and 2017, respectively). Accordingly, we introduce theoretical flue gas rates into the estimation to avoid significant underestimation of the actual volume when there are too many missing data values46. In addition, the adoption of theoretical flue gas rates can address flue gas leakage, a common problem in power plants that greatly distorts the real flue gas volume46. The theoretical flue gas rates are derived from the China Pollution Source Census, with values varying across operating capacities, fuel types and boiler types42,45. Thus, the actual volume of flue gas is computed in terms of the theoretical flue gas rate times actual fuel consumption.

The absolute emissions of PM, SO2 and NOX from individual power plants can be estimated in terms of the emission factors times the activity levels21:

$${E}_{s,i,y,m}=E{F}_{s,i,y,m}{A}_{i,y,m}$$
(4)

where \({E}_{s,i,y,m}\) represents the air pollution emissions (g); and \({A}_{i,y,m}\) is the activity data, i.e., the amount of fuel use (kg for solid or liquid fuel and m3 for gas fuel). In the CEAP dataset, power plant emissions are estimated on a monthly basis (the smallest scale for activity data), in which the yearly unit-level activity data are allocated at the monthly scale using the monthly province-level thermal power generation as weights16:

$${A}_{i,y,m}=\frac{{F}_{{p}_{i},y,m}}{{\sum }_{m=1}^{12}{F}_{{p}_{i},y,m}}{A}_{i,y}$$
(5)

where \({F}_{{p}_{i},y,m}\) denotes the thermal power generation by province Pi, which is obtained from the Chinese Energy Statistics Yearbook26, and \({p}_{i}\) indicates the province where unit i is located.

Data Records

A total of 12 data records (emissions and plant/unit information inventories) are contained in the CEAP dataset, which have been uploaded to public repository figshare47. Of these

  • 4 are emission inventories for China’s power plants (2014–2017) [“CEAP-Absolute emissions, 2014–2017”];

  • 4 are stack gas concentration inventories for China’s power plants (2014–2017) [“CEAP- Stack gas concentrations, 2014–2017”];

  • 4 are summary descriptions for China’s power plants (2014–2017) [“CEAP-Summary descriptions, 2014–2017”].

The CEAP dataset introduces systematic real measurements by China’s CEMS network to directly estimate the PM, SO2 and NOX emissions from China’s power plants during 2014–2017 (Fig. 1). In particular, the dataset provides plant-level information about absolute emissions, fuel uses, generating capacities and geographic allocations for 2,583, 2,714, 2,596 and 2,596 power plants from 2014 to 2017, respectively. In addition, the CEAP dataset presents dynamic stack concentrations by region and fuel type and describes the overall structures of operating units, capacities, ages, emission factors, emissions and CEMS coverage for China’s thermal power plants.

Fig. 1
figure 1

Estimated power emissions in China from 2014 to 2017. (ac), Monthly estimates for the total and regional (coloured bars) emissions (Gg) of PM (a), SO2 (b) and NOX (c) from Chinese power plants. The error bars indicate the uncertainty ranges.

Technical Validation

Uncertainties

The CEMS-based estimates are subject to uncertainties arising from volatilities in the CEMS data, the introduction of theoretical flue gas rates and the projection of activity data. Thus, uncertainty analyses are performed to verify the robustness of our estimates. Generally, the uncertainty analysis on each examined model variable or parameter (emission concentrations, theoretical flue gas rates or activity data) includes five main steps: (a) estimate the probability distributions by fitting data with an given distribution as the input of the Monte Carlo approach; (b) generate random values based on the probability distributions via Monte Carlo simulation; (c) put the random values into Eqs. (35) to replace the original values and obtain a new set of estimates for emission factors and total emissions; (d) repeat steps (b) and (c) 10,000 times and obtain 10,000 sets of results16,17,48,49; and (e) yield the uncertainty ranges of our estimates in terms of 2 standard deviations of the new 10,000 set of results21. Table 6 reports the related results and reveals that the uncertainties can be controlled within a small range (i.e., ±9.03% and ±2.47% for emission factors and absolute emissions, respectively).

Table 6 Uncertainty ranges of the estimated emission factors and absolute emissions.

Uncertainties in CEMS data

The volatility in stack gas concentrations (the key model inputs in our estimation) should be considered in the uncertainty analysis42. As the hourly CEMS measurements are recorded as an average over an hour time period, the associated volatility well reflects real variability in the emissions (as power demand rises and falls throughout the day, for example)32. We assume normal distributions for stack concentrations for each unit on a monthly basis and then draw the related parameters of distributions (e.g., the mean and the standard deviation) through data fitting based on the associated daily averages of the CEMS measurements50,51. For a unit without CEMS, the bootstrap method is used to select samples from the units of the same fuel type and the same region in the CEMS network at an equal probability. Then, the Monte Carlo simulation is performed to generate random stack concentrations based on the associated distributions17,42. With 10,000 simulations, the uncertainty ranges of the estimates are assessed to be small, i.e., ±8.65% and ±1.09% for the emission factors and absolute emissions, respectively.

Measurement uncertainties lead to a certain level of CEMS-monitored stack concentration deviations28. According to the official regulation27, a qualified CEMS instrument should control the error tolerance within ±15%, ±5% and ±5% for PM, SO2 and NOX concentrations, respectively. Accordingly, we assume uniform distributions within the allowed tolerance ranges for all stack concentrations on the hourly, unit and pollutant basis. Then, random stack concentrations are generated using the Monte Carlo technique and put into Eq. (3) replacing the associated original values. A total of 10,000 simulations are run to estimate the uncertainty ranges of our estimates (in terms of 2 standard deviations). The results show that the final uncertainties fall within ±10.38% for emission factors and ±0.59% for total emissions.

Uncertainties in theoretical flue gas rates

Given that a large proportion of measurements of actual flue gas rates are missing in CEMS data (29.97–42.96% from 2014 to 2017), we introduce theoretical flue gas rates (fourth column of Table 5) in the estimation. Even though this method can prevent significant underestimations and flue gas leakage, uncertainties might arise due to the heterogeneity across units in factors such as technologies, operational situations and feedstocks. We assess the uncertainty ranges of flue gas rates (defined as the lower and upper bounds of a 95% confidence interval around the central estimates16,48; six column) using the real samples in the CEMS database for 1,373 units that have different unit capacities, fuel types and boiler types and are located throughout mainland China (fifth column). A single-sample two-tailed t-test is conducted, and the results (last column) indicate that the mean CEMS-monitored flue gas rates (fifth column) are at similar levels to the theoretical values that we used (fourth column). In the uncertainty analysis, Monte Carlo simulation is conducted to produce random flue gas rates following a uniform distribution on the associated uncertainty ranges48,52. For the unit types without uncertainty ranges (e.g., those burning solid waste, oil and petroleum coke), the largest range (i.e., ±10.07%) is employed. Relying on 10,000 simulations, the results show that uncertainty ranges can be well controlled within ±6.90% and ±0.23% for the emission factors and absolute emissions, respectively.

Uncertainties in activity data

The unit-specific activity data are available only up to 2016, and the 2017 values are projected using the monthly provincial data for 2017. This approach assumes that the growth rates in the activity levels of different units in a province are uniform from 2016 to 2017, which somewhat contradicts reality and brings about uncertainties. To assess such uncertainties, a bootstrap method is used to generate 10,000 samples of the growth rates from the previous values from 2014 to 2016, and statistical analysis is employed to fit these samples in a normal distribution. The Monte Carlo simulation is performed to generate random growth rates and thence the growth of activity levels from 2016 to 2017 for individual units, and the total provincial growth is allocated into each unit using the random growth as weights. With 10,000 simulations, the uncertainty range of total emissions is estimated to be quite small (within ±0.03%).

Comparison with existing databases

We compare our estimates with existing databases, finding that our estimates of Chinese power emissions (using the real CEMS measurements for 2014–2017; purple bars in Fig. 2) are 18.62–91.86%, 54.98–69.77% and 17.55–67.76% below previous estimates (based on average emission factors that were evaluated up to 2012 without considering the recent mitigation effect particularly attributable to the ULE standards policy promulgated in 2014) for PM, SO2 and NOX, respectively. Furthermore, using the detailed measurements on the source and hour basis, the uncertainty of our estimates can be controlled at a relatively low level (error bars).

Fig. 2
figure 2

Comparison of estimated power emissions in China from 2014 to 2017. (ac), The estimated Chinese power emissions (Tg) for PM (a) SO2 (b) and NOX (c) in our database (purple bars) and in existing databases (Refs.5,10,11,20,53,54,55; the Greenhouse Gas and Air Pollution Interactions and Synergies database (GAINS) (https://gains.iiasa.ac.at/models/gains_models3.html); the Multi-resolution Emission Inventory for China (MEIC) (http://meicmodel.org/); non-purple bars). The error bars indicate the associated uncertainty ranges.

Limitations and future work

The CEAP dataset can be improved and extended from the following perspectives. First, some power plants have not yet been incorporated into the CEMS network, which account for an average of 3.8% of the total thermal capacity for 2014–2017. Therefore, collecting and incorporating these samples is needed to extend the CEAP dataset. Second, apart from air pollutants from power plants, the CEMS network monitors both air and water pollutants from various industries, totalling over 30,000 emission sources. Based on these data, the CEAP database can be extended into multisector datasets for both air and water pollutants in the future. Third, due to the data availability, the estimation does not use high-frequency information about activity data, such that CEMS data majorly drive the power emissions on a monthly scale. Future research involves incorporating hourly operational data (especially fuel consumption and flue gas rates) for each unit to improve the reliability of emissions estimates. Fourth, although great efforts have been made to guarantee the reliability of CEMS data, serious verification works (such as aerial concentration measurements) are still needed to check the data quality of the CEMS system41.