Stronger policy required to substantially reduce deaths from PM2.5 pollution in China

Air pollution kills nearly 1 million people per year in China. In response, the Chinese government implemented the Air Pollution Prevention and Control Action Plan (APPCAP) from 2013 to 2017 which had a significant impact on reducing PM2.5 concentration. However, the health benefits of the APPCAP are not well understood. Here we examine the spatiotemporal dynamics of annual deaths attributable to PM2.5 pollution (DAPP) in China and the contribution from the APPCAP using decomposition analysis. Despite a 36.1% increase in DAPP from 2000 to 2017, The APPCAP-induced improvement in air quality achieved substantial health benefits, with the DAPP in 2017 reduced by 64 thousand (6.8%) compared to 2013. However, the policy is unlikely to result in further major reductions in DAPP and more ambitious policies are required to reduce the health impacts of air pollution by 2030 and meet the United Nation’s Sustainable Development Goal 3.

reported the provincial death rates in 2017 in China. However, these data are only accessible for four diseases (IHD, stroke, COPD and LC) in three provinces (Beijing, Hebei and Sichuan). Hence, we only compared the results that can be estimated by these available data. IHD, COPD, LC refer to ischemic heart disease, chronic obstructive pulmonary disease and lung cancer, respectively. The relative difference was calculated by (DAPP national death rates −DAPP provincial death rates ) ∕ DAPP national death rates .

Supplementary Note 1. Detailed uncertainty analysis.
Although we considered the 90% confidential intervals of the Integrated Exposure-Response (IER) function as a major source of uncertainty, there are several other uncertainties in our estimation and decomposition of deaths attributable to PM 2.5 pollution (DAPP). First, the input data might lead to some error and bias. Regarding the estimation of PM 2.5 concentration, the PM 2.5 concentration data derived from satellite imagery and modeling tends to be lower than the monitoring data 23 . Although the concentration data in our research was further calibrated by monitoring data, our figures may still be conservative. In addition, at a 10km grid cell spatial resolution, the PM 2.5 data is relatively coarse which could also lead to underestimation 24 . Besides, we used the annual average PM 2.5 concentration to express the level of exposure directly, without considering the influence of behavior (inhalation rate and outdoor time) on personal exposure 13,25 . Regarding other input data, especially death rates and age structure, we assumed that the death rates and age structure were homogeneous across the country, because of the accessibility of data. This may also lead to some generalization in the estimation of DAPP at the provincial level. To assess this impact, we compared our estimates and the DAPP derived from provincial death rates using data available only for three provinces (Beijing, Hebei and Sichuan) in 2017, with other inputs fixed. The comparison showed that the use of national death rates in our study led to an overestimate of deaths for Beijing, Hebei and Sichuan. Specifically, for Beijing which has a much better healthcare standard, the DAPP estimated based on provincial death rates was more than 50% lower relative to our estimation based on national death rates. Besides, the differences are relatively minor for Hebei and Sichuan which represent a medium healthcare standard in China. The DAPP estimated based on provincial death rates are 3.2% and 6.1% lower than our main results, respectively 22 (Supplementary Table 2).
Second, the exposure-response relationship between PM 2.5 concentration and the risk of diseases has some underlying assumptions. For example, the toxicity of PM 2.5 is assumed to be remain constant with different PM 2.5 composition. The confounding effects among the different risk factors (e.g., smoking and alcohol use) are not considered, and the relative risk was quantified in terms of annual average PM 2.5 concentration without considering the short-term variation 2,26 . For PM 2. 5

concentration in
China, the proportion of black carbon is higher than the global average 27 . As the toxicity of black carbon is higher than other PM 2.5 components 28 , DAPP tends to be underestimated. The fitting strategy of the exposure-response relationship will also influence our estimation. Although the IER function is  Figure 4).
Third, the selection of diseases may introduce uncertainty. We only considered 6 diseases in our estimation, which could lead to an underestimation of DAPP as PM 2.5 pollution is related to a range of other diseases including mental health 30 , Alzheimer's disease 31 and other non-communicable diseases 29 .
In addition, because we did not consider the potential double-counting of the deaths arising from household PM 2.5 pollution, this may introduce some bias in our results 32 .

Supplementary Note 2. Detailed description of PM2.5 concentration data sources.
In this research, we used the gridded PM 2.5 concentration from the EFPMV4CH02 dataset (available from 2000-2016) 33 Figure 7).
where C i,t is the PM 2.5 concentration at pixel i in year t, and the unit is μg m −3 . α a,d ，γ a,d ，δ a,d , and C 0 a,d are the parameters of the IER model. These parameters were fitted based on multiple epidemiological studies. Specific IER models were used for a given age a and a given disease d 2 . Since the fitting of the IER model has some uncertainty, following previous studies, the central estimate and 90% uncertainty intervals were obtained as the 50, 5 and 95 percentiles from 1000 draws of each parameter for the IER fitting 38 . For ischemic heart disease and stroke the IERs were age-specific, but for other diseases the IERs were uniform across different age groups.
In GBD 2017, the PAF of ambient PM 2.5 pollution was calculated using a proportional approach, which used the IER to calculate the relative risk and PAF for both exposure to ambient PM 2.5 pollution and indoor PM 2.5 pollution. These were then weighted by the proportion of individuals exposed to each source, in order to avoid the potential double-counting of the health burden that may arise from PM 2.5 pollution from ambient and household sources 32 . In our study, because of the lack of corresponding indoor PM 2.5 pollution data, we used the method used in previous GBD reports to calculate the PAF of ambient PM 2.5 pollution individually 36,37,43,44 .
The pixel-level population POP i,t was obtained by allocating the statistical population data of each prefectural-level city. We used the population distribution of the History Database of the Global Environment (HYDE3.2) dataset 45 as the reference data and allocated the statistical population data 11 to each pixel at a spatial resolution of 10km. The process can be expressed as: where POP i,t refers to the population at pixel i in year t; HYDEPOP i,t refers to the population at pixel i in the HYDE3.2 dataset in year t; SPOP p,t refers to the statistical population of prefectural-level city p in year t; HYDEPOP i,p,t is the population at pixel i within the prefectural-level city p in the HYDE3.2 dataset in year t. Because some prefectural-level cities did not have statistical population data, we allocated their population in a similar way by using the statistical population data at the provincial level 16 .

Supplementary Note 5. Detailed formula of sequential Mann-Kendall test.
The sequential Mann-Kendall test is a nonparametric test, which is widely used in meteorological and hydrological research in detecting abrupt changes in trend [46][47][48] . The null hypothesis of the sequential Mann-Kendall test is that the elements show no beginning of a developing trend. In the sequential Mann-Kendall test 46 , the abrupt point (beginning of a developing trend) in a given series X with n samples can be detected as the intersection point of the test statistics u p (t) and u r (t), which are calculated based on the progressive row {x 1 , x 2 , x 3 …, x n } and retrograde row {x n , x n-1 , x n-2 …, x 1 } of the sample, respectively.
The u p (t) of a given series can be calculated as: where a t is an auxiliary statistic variable for the point t (varied from 2 to n). a t is assumed to be asymptotically normal, with a mean value E(a t ) and variance VAR(a t ). These variables can be calculated as: where R t is the number of precedent elements x j (j < t) that are smaller than x t , with t = 2, …, n, and j = 1, …, i−1; l t is the count of elements from x 1 to x t .
The u p (a t ) can be calculated using the similar equation, but using a reversed data series {x n , x n−1 , x n−2 …, x 1 } and multiplied by −1.
The point where the test statistics u p (t) and u r (t) cross marks the abrupt change in trend. If the absolute value of u p (t) and u r (t) is higher than the tipping point of the given significance level (1.28 for P=0.10 and 1.65 for P=0.05), the null hypothesis will be rejected, which indicates an abrupt change in trend 47 . The intersection point of the test statistics u p (t) and u r (t) can be detected as follows: , where the u p (t) and u r (t) represent the test values of the progressive and retrograde row, respectively.