Introduction

Beijing, the capital of China where more than 20 million people reside, would probably never have considered it would gain the title “Capital of Smog” that was used for London 60 years ago. But now the title seems to fit Beijing appropriately. Clearly, air pollution not only undermines the reputation of Beijing as a historic world-renowned city, but more importantly it poses citizens and the government with a critical challenge for the sustainable development of urbanization that involve major public health concerns. Of all the most common detrimental air pollutants, fine particulate matter (PM2.5) is believed to be the most serious pollutant due to its harmful health impact on the cardiovascular, respiratory and pulmonary functionality in humans1,2. There are increasing evidences that PM2.5 is associated with population mortality3,4, cardiovascular and respiratory diseases mortality5 and has adverse impacts on growth of new-borns6 and even on mental health and can cause anxiety7.

Actions need to be taken to mitigate PM2.5 problems in Beijing as well as in other cities of China. For this, we should first measure PM2.5 and analyse it to determine inherent variation patterns. Until now, a handle of research efforts have been made to this end8,9,10,11,12,13,14,15,16. While these studies report interesting results about PM2.5, they are defected in several ways. Some research are limited because of data availability such as a limited number of PM2.5 monitoring stations8,11, or that data is only available for a limited period12,13,14. These studies fail to provide a sufficient overview of PM2.5 concentration patterns across the city of Beijing and through a full year. There have been studies that analysed PM2.5 measurements data in a full year across Beijing that were provided by the newly launched air pollution monitoring network since later 20129,10,15,16, but these studies, when analysing the temporal variation of PM2.5 concentration, used an a priori assumption that PM2.5 concentration follows seasonal, monthly, or weekly patterns. The reasoning in these studies is that since the PM2.5 concentration probably follows seasonal, monthly, or weekly patterns, the analysis framework could be based on an imposed seasonal, monthly, or weekly profile analysis. We argue that the variation of PM2.5 concentration may vary on different time scales other than these predefined scales and studies using these predefined time scales are likely to provide incomplete information and therefore miss important insights.

Our study, instead of making arbitrary assumptions on weekly, monthly and seasonal patterns, prefers to reveal the data. Using a full year of PM2.5 ground-level measurements from January 2014 to December 2014 in Beijing, our study conducted two time-series clustering analyses for all the daily PM2.5 measurements. In this way, our study offers an innovative calendar visualization of PM2.5 concentration on a daily basis over the year of 2014, which yields important insights on temporal variation patterns of PM2.5 concentration.

The contribution of our study is two-fold. First, our study presents an innovative and straightforward calendar visualization of daily PM2.5 time-series in Beijing in the year of 2014. This technique provides a very useful tool to visualize and understand the data and can be applied to examine temporal patterns of other air pollutants. Second, the insights generated from the two calendar plots advance our understanding of Beijing’s PM2.5 concentration. Compared to previous studies on Beijing’s PM2.5 concentration, our study offers a different perspective and brings in insights on PM2.5 concentration that are more complete and convincing.

Results and Discussion

Figure 1 shows two calendar views of the cluster analyses using the correlation distance and Euclidean distance and two corresponding trend curves of averaged PM2.5 concentrations. We obtain three clusters for the analysis based on correlation distance and each of them has 162, 117 and 86 time-series (days). We named these as S1, S2 and S3, as shown in Fig. 1a,c. For the cluster analysis based on Euclidean distance, nine clusters are formed, each consisting of 255, 82, 15, 5, 2, 2, 2, 1 and 1 time-series. They are named as L1, L2, L3, L4, O1, O2, O3, O4 and O5 (Fig. 1b,d). Those clusters with less than three time-series, namely O1, O2, O3, O4 and O5, are considered as “outliers” that either have extremely high PM2.5 concentration or exhibit odd variation patterns. We will discuss these “outliers” later.

Figure 1
figure 1

Calendar views of PM2.5 concentration clusters in Beijing in the year 2014.

(a) shows PM2.5 time-series cluster result based on correlation distance and the letter S denotes “shape”; (b) shows the cluster result based on Euclidean distance, L denotes “level” and O refers to “outlier”. (c) shows the averaged PM2.5 trend for clusters based on correlation distance and (d) shows the averaged PM2.5 for clusters based on Euclidean distance. Note that the colours and labels are matched for each cluster for consistency and the lines for O1, O2, O3, O4 and O5 are set to dash for clear presentation.

Interpretation on calendar visualization

The calendar plot based on correlation distance (Fig. 1a) and the corresponding curve (Fig. 1c) shows the cluster result based on shape differences among the 365 PM2.5 time-series. The result shows that there are three distinct variation patterns for the PM2.5 time-series. An increasing pattern from 0 AM to 11 PM in a day is most likely to be observed from January to March and from September to December (S1 in Fig. 1a,c). For these days that show an increasing PM2.5 concentration pattern, the maximum PM2.5 concentration of the day usually occurs at night. The decreasing pattern can be observed in all months throughout the year (S2 in Fig. 1a,c) and this pattern attains its minimum value in the afternoon. The third pattern with a shape like an inverted V often take place from April to August (S3 in Fig. 1a,c) and the PM2.5 concentrations during these days usually peaks at noon. These results show that the diurnal patterns of PM2.5 vary from day to day through the year and PM2.5 concentration in the daytime could be higher than at night in many days, which complement previous studies concluding that diurnal variation of PM2.5 change by seasons and PM2.5 concentration at night is higher than that in the daytime10,16.

Our findings are consistent with a previous research which identified a ‘sawtooth cycle’ of PM2.5 variation17. During a ‘sawtooth cycle’, the PM2.5 concentration first rises over a few days, which corresponds to the increasing pattern in our study (S1 in Fig. 1a,c) and then falls, which matches the decreasing pattern in our study (S2 in Fig. 1a,c)17. One possible interpretation is that the increasing and decreasing patterns (S1 and S2 in Fig. 1a,c) are largely formulated by the passage of cold front. When the cold front arrives, high-speed wind associated with the cold front blows the pollution away and thus the PM2.5 concentration is decreasing. But when the cold front moves on, cold air underlies the warm air as the cold air is denser and heavier, which leads to temperature inversion. The temperature inversion traps PM2.5 pollution near the surface and makes the PM2.5 concentration increasing.

Human activities such as heating and combustion, as well as weather conditions including wind, boundary layer height, etc. are closely linked to the variation of PM2.5 concentration18,19. As we can see in Fig. 1a,c, not all variation patterns in PM2.5 concentration match the daily cycle of human activities such as transportation that usually peaks in the morning and afternoon during a full day. The third pattern (S3 in Fig. 1a,c) is the closest one that possibly matches the daily cycle of human activities but this pattern usually happens from April to August. This finding suggests that the effect of human activities on variations of PM2.5 concentration may vary at different time periods. We speculate that from January to March and September to December, weather conditions including cold front, wind, boundary layer height, etc., may be the major factors determining variations in PM2.5 concentration. However, from April to August, the weather conditions (e.g., cold front) weaken and human activities thus might have stronger impact on PM2.5 variation.

The cluster result based on differences in PM2.5 concentration levels can be found in the calendar plot based on Euclidean distance (Fig. 1b) and the corresponding curve (Fig. 1d). We can see that a majority of days in the year have an averaged PM2.5 concentration of around 50 μg/m3 (L1 in Fig. 1b,d), a figure far beyond the WHO (25 μg/m3) and USA air quality standards (15 μg/m3). The calendar plot also indicates that high averaged PM2.5 concentration around 150 μg/m3 (L2 in Fig. 1b,d) are likely to occur in every month throughout the year. Also, extremely high PM2.5 concentration above 250 μg/m3 (L3, O1, O2, O3, O4 and O5 in Fig. 1b,d) can be usually observed in January, February, March, October, November and December. This finding is consistent with previous studies concluding that PM2.5 concentration is generally the highest during winter and lowest during summer15,16.

Outliers

A few “outliers” (O1, O2, O3, O4 and O5 in Fig. 1b,d) can be found in Fig. 1b. For example, two notable “outliers” O4 and O5 on January 15 and February 26, 2014, respectively, show quite drastic variations across the day. As we can see, extremely high PM2.5 concentrations (O5 has a maximum PM2.5 concentration of 534 μg/m3) are observed on the two days and the two incidents were reported by the Guardian20, Time magazine21 and Financial Times22.

One event of particular interest is the Asia-Pacific Economic Cooperation (APEC) Summit on 10 and 11 November 2014 in Beijing. It is reported that in order to maintain a blue sky in Beijing during the APEC Summit, coordinated efforts were taken by the governments of Beijing and six surrounding provinces before the summit23. Measures included impositions on road traffic and plant operations. The two calendar visualization plots in our study indicate that PM2.5 concentration was very high in mid-October before the summit. For example, on October 19, 24 and 25, the PM2.5 concentration was over 150 μg/m3. After the emission control measures were enforced, the PM2.5 concentration was greatly reduced on November 1. However, on November 4, a sharp increase in PM2.5 concentration was observed, which was around 150 μg/m3. Fortunately, a significant reduction occurred on November 5 and PM2.5 concentration returned to lower level afterwards by November 15, four days after the summit. These interpretations from the two calendar plots can also be obtained from local observations, but here we would like to note that the two calendar visualizations in our study offer a much more straightforward understanding of the whole picture of PM2.5 variations over time than using other tools.

Seasonal and weekly patterns?

As we can see from the two cluster results, both shape and level variation do exhibit a rough seasonal pattern but the pattern do not follow strict seasonal divisions. As Fig. 1a shows, S1 pattern usually occurs in around winter seasons (from January to March and from September to December) and S3 patterns often happens around summer times (from April to August). Figure 1b shows a rough seasonal pattern too. Days in L3 cluster usually occur near winter (in February, March, October, November and December but not January) although days in L1 and L2 clusters can be found in any month throughout the full year which doesn’t exhibit very clear seasonal pattern. There may exist significant differences in PM2.5 concentration levels between different seasons10,15,16, however we argue that the arbitrary seasonal division of variation in PM2.5 concentration may result in information loss and conceal potentially important insights. The calendar visualization used in our study, however, provides an informative and straightforward way to look into variation patterns of air pollutants.

Several studies reported that there existed weekly patterns in PM2.5 concentration in Beijing9,13. And their findings are not consistent with each other. One study stated that the lowest concentrations occurred in Mondays while the highest concentrations appeared from Thursdays to Saturday9; another study concluded that PM2.5 concentrations on weekdays were lower than that on weekends13. Our findings, however, do not observe these reported weekly patterns. Figure 1b shows that among all 52 weeks in 2014, higher PM2.5 concentrations in weekdays than those in weekends are observed in at least 18 weeks. For example from March 24 to 30, the lowest PM2.5 concentrations were observed on weekends while the highest were on weekdays (Fig. 1b). Further calculations show that about half of all weeks in 2014 have higher averaged concentrations on weekdays and the lowest PM2.5 concentrations are observed on Mondays of only 13 weeks and in only 30 weeks the highest concentrations appeared from Thursday to Saturday. Our results do not support the reported weekly patterns.

We did not observe any explicit and universal weekly variation pattern after visual inspection over the two calendar plots (Fig. 1a,b) and further calculations. This finding suggests that the weekly cycle of human activities may not play a key role in determining variations in PM2.5 concentration. Our finding complements and improves previous studies that report weekly patterns in PM2.5 concentration in Beijing9,13.

Future research

As we know, PM2.5 pollution can be measured in terms of optical properties and chemical compositions in addition to the mass concentration24,25,26,27. With the help of the calendar visualization technique used in this study, these informative properties and other air pollutants such as NO2, SO2 and O3 can help provide a better understanding of the air pollution problem.

Data and Methods

Data

The PM2.5 measurement data in Beijing used in this study were originally obtained from the official hourly air quality reporting platform (http://zx.bjmemc.com.cn/) run by Beijing Environment Protection Agency. This platform is part of the national air quality monitoring network initiated in late 2012. The data is rich, reporting hourly concentrations of six pollutants: particulate matter with aerodynamic diameter no greater than 2.5 microns (PM2.5), particulate matter with aerodynamic diameter less than 10 microns (PM10) and sulphur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3) and carbon monoxide (CO) in 35 stations across Beijing (Fig. 2). However, the data is not easily accessible because the online reporting system only reports the air quality of the day and does not show historical data and is unavailable to the public. Fortunately, third parties created by civic efforts such as PM25.in, AQISTUDY.cn and EPMAP.org have been crawling this data since late 2013.

Figure 2
figure 2

Air Quality Monitoring Stations in Beijing.

The map is generated by the authors using ArcGIS 10.2.2 (www.esri.com).

Our study uses one-year air quality monitoring data from 1 January 2014 to 31 December 2014 from AQISTUDY.cn, EPMAP.org and the US Embassy Beijing Air Quality Monitor (Fig. 2). We noticed that there are missing hourly measurements in all the three data sources. Therefore, we combined them to get complete PM2.5 measurement data covering 24 hours of all the 365 days in 2014. The US Embassy Beijing Air Quality Monitor is operated by the US Department of State. The US Department of State requires that the following disclaimer by included in any publication that uses these data: “State Air observational data are not fully verified or validated; these data are subject to change, error and correction. The data and information are in no way official”.

A comprehensive data quality check on the raw data is conducted to reduce the impact of problematic data points, including duplicated data records, missing measurements with a placeholder, implausible zeros, etc. After the data quality check, the hourly PM2.5 measurement data for all 35 stations are then aggregated into one averaged PM2.5 concentration per hour for cluster analysis as explained below.

Method

Since we have 24 hourly PM2.5 measurements for each day, it implies we have 365 time-series objects with 24 data points each to analyse. We would like to aggregate together time-series objects with similar variation patterns of PM2.5 concentration and separate those with dissimilar patterns into different groups. Thus, we employ time-series clustering technique to mine the data.

In general, there are two essential components in a clustering analysis: clustering algorithm and distance measure28. Clustering algorithm controls the procedures on how similar objects are clustered, while distance measures are used to establish the resemblance between two objects. There are several algorithm and distance measures available in the field of cluster analysis but our study employed the most straightforward and suitable clustering method and metrics. Specifically, we use average-linkage agglomerative hierarchical clustering as the clustering method because this method generates repeatable and consistent results and does not require the number of clusters to be specified as compared with K-means29 and it is usually able to obtain more robust cluster results than other hierarchical clustering methods30.

Distance measures were selected based on two basic features of the PM2.5 time-series data: level and shape. Level refers to the quantity of PM2.5 concentration and the Euclidean distance is used to identify the level difference between PM2.5 time-series. Shape refers to trends in PM2.5 concentration variation with respect to time and we use Pearson’s correlation-based distance to capture the shape difference between PM2.5 time-series. We derived a generalized correlation-based dissimilarity function from this study31 by making the coefficient α and power β adjustable (equation (1)).

where the correlation coefficient

This dissimilarity function satisfies all the requirements for dissimilarity measure: the non-negativity, symmetry and identity32,33. When both α and β are set to 1, this dissimilarity function becomes the classic Pearson’s correlation coefficient distance that has been used in several studies34. In our study, however, we deliberately set α and β to 0.5 and 0.25, respectively, in order to attain a desirable robust cluster result.

We employ the cophenetic correlation coefficient to examine the validity and robustness of the cluster analysis. Cophenetic correlation coefficient is a measure of how faithfully the hierarchical cluster results represent the dissimilarity among observations35. It is defined as the linear correlation coefficient between the original pairwise dissimilarities and the cophenetic dissimilarities obtained from the dendrogram. The value of this coefficient varies between 0 and 1. A higher cophenetic correlation coefficient indicates a better cluster solution and a value of 0.8 or higher is usually regarded as a successful cluster application36.

It turns out that the cophenetic correlation coefficients for Euclidean-distance-based and correlation-distance-based cluster analyses are 0.86 and 0.81, respectively, suggesting that both cluster results are robust and valid.

We used Python version 2.7.5 to process and analyse the data and R version 3.2.2 to draw the calendar plots.

Additional Information

How to cite this article: Liu, J. et al. Temporal Patterns in Fine Particulate Matter Time Series in Beijing: A Calendar View. Sci. Rep. 6, 32221; doi: 10.1038/srep32221 (2016).