Introduction

Ozone in the stratosphere forms the ozone layer, which is essential for absorbing the Sun’s ultraviolet radiation and protecting life on Earth. However, Ozone in the lower levels of the troposphere is a kind of air pollutant that has serious damage to human life, such as corroding human lungs, and destroying crops and forest vegetation1. Numerous studies indicated that worldwide, several hundreds of thousands of premature deaths annually were associated with ground-level O3 pollution2. Surface O3, known as a secondary air pollutant, isn’t directly emitted by vehicles or industries; rather, it is produced by the reaction of sunlight with nitrogen oxides and hydrocarbons in the air3,4. In recent years, it has become a significant pollutant in major urban areas of China5,6,7. Notably, a recent study indicated a substantial rise in O3 pollution across Northern China. However, the trends of O3 pollution in several regions of China do not exhibit significant increases compared to the influences of natural and climate variability8. As such, predicting surface O3 concentrations is crucial for mitigating the damage caused by air pollution in China. Notably, long-term predictions are invaluable as they enable governments to strategically plan air pollution control measures months or even a year in advance.

Efforts to improve predictions of surface O3 concentrations have involved investigating the primary factors influencing them7. Not only do emissions of surface O3 precursors play a significant role8,9, but large-scale ocean-atmosphere circulations also have substantial impacts on O3 levels10. Connections between large-scale ocean-atmosphere circulations and local air pollution are critical for long-term air pollution forecasting11. Previous research has demonstrated that large-scale climatic patterns can be harnessed to predict surface-level O3 concentrations in the United States up to a season in advance, attributed to the interactions between large-scale ocean-atmospheric circulations and the inherent long-term memory10. Additionally, it has been observed that the North Atlantic Oscillation-driven anomalous atmospheric circulations influence O3 pollution in Europe by modulating the photochemical reactions involved12,13.

Furthermore, certain meteorological conditions such as drought, elevated temperatures, and intense sunlight are conducive to the formation of surface O3 pollution. Studies investigating the relationships between these meteorological conditions and broader climatic influences have unveiled the potential for seasonal prediction of surface O3 concentrations14. Springtime warming in the Western Pacific Ocean, Western Indian Ocean, and Ross Sea is related to the interannual shifts in the frequency of simultaneous occurrences of heat waves and O3 pollution in China during summer15. Arctic sea ice has also been identified as an indirect influence on surface O3 levels in Northern China through its effect on meteorological patterns across Eurasia16. Rossby waves, known for triggering anticyclones in Northern China, create an environment characterized by stable air, high temperatures, and low humidity, which consequently traps particulate matter and O3 pollutants near the surface17,18. Utilizing a global three-dimensional Goddard Earth Observing System Chemical transport model (GEOS-Chem), researchers have found that the East Asian summer monsoon plays a significant role in the interannual variability of summer surface O3 in China19. Moreover, the El \({{{\rm{Ni}}}}\tilde{n}{{{\rm{o}}}}/{{{\rm{Southern}}}}\) Oscillation (ENSO), a paramount inter-annual climate phenomenon, has been linked to variations of the total ozone column amounts by affecting tropopause height20,21,22. Global climate change has also been implicated in the rising trends of surface O3 concentrations in China16,23.

However, extant literature primarily concentrates on O3 pollution in specific regions of China (e.g. north China) affected by several teleconnection patterns or the long-term variability of O3-related meteorological conditions under climate effects. These studies did not directly address the long-term prediction of O3 pollution across China. It is essential to recognize that predictions of O3-related meteorological conditions and O3 pollution itself are not completely equivalent. Moreover, there are O3 fluctuation patterns due to the correlation of O3 concentration between different regions within China24. Consequently, the objectives of this study encompass the identification of important fluctuation patterns in surface O3 concentrations across China, and the examination of their associations with the global SSTA. SSTA has well-documented impacts on atmospheric circulation and further subsequent implications for ozone levels25. Furthermore, the long-lasting memory of SSTA, ranging from months to years, harbors potential seasonal prediction power26,27.

To identify crucial patterns of O3 fluctuations, eigenanalysis emerges as a potent and efficient technique for decomposing multiple fluctuation patterns into a set of independent principal modes. Common methods include classic principal component analysis (PCA) and empirical orthogonal function (EOF) analysis, recognized as efficient tools in climate studies and various investigations into O3 pollution10,24,28,29. For instance, two dominant patterns of summer O3 pollution in China have been discerned with the influence of the West Pacific Oceans30. Additionally, an advanced Eigen Microstate method, approached from a statistical physics standpoint31,32,33, has been employed. Thus, here we focus on predicting the time series of the first dominant pattern, effectively addressing the intricate nature of ozone pollution across various regions in China, including the north and south. This involves extracting the system’s eigen or principal modes and examining their temporal evolution.

RESULTS

Spatial and temporal characteristics of the surface O3 pattern in China

We first perform data preprocessing to derive the detrended surface O3 and SSTA. In order to investigate the spatiotemporal characteristics of O3 patterns over China, we employ eigen analysis on the detrended O3 field for the summer and winter of the training period (June 2013 to February 2018); and obtain eigenvectors and principal modes for both seasons (refer to Methods). We find that the 1st eigenvalues account for 22.8% and 36.8% of the total variance in summer and winter, respectively. The spatial distributions of the corresponding eigenvector u1 are predominantly positive across China in both seasons, with notable differences in regional concentration (see Fig. 1a and b). During the summer season, the most significant positive components, indicating elevated O3 concentrations, are observed in the northern regions of China, specifically in Hebei and Shanxi, as illustrated in Supplementary Fig. 1a. This pattern is attributable to heightened precursor emissions34 and the meteorological phenomenon of a high-pressure anomaly in north China. Consequently, this anomaly induces both high-temperature and low-humidity anomalies, fostering enhanced photochemical reactions, all of which contribute to O3 production30.

Fig. 1: Spatial and temporal characteristics of the surface O3 pattern in China and significance tests.
figure 1

Spatial distribution of the 1st eigenvector of the O3 principal mode in a summertime and b wintertime, respectively. Time series of PC1 in c summertime and d wintertime, respectively. PDFs of the absolute correlations \(| C{C}_{{v}_{1},{s}_{i}}^{* }|\) between PC1 and SSTA nodes for real data (red area) and the synthetic data for WNNM (green) and MANM (blue) for e summertime and f wintertime, respectively. Black dashed line represents 97.5th percentile of the synthetic correlations of MANM.

Conversely, during the winter season, Southern China emerges as the region with the highest O3 concentration, as depicted in Supplementary Fig. 1b. This phenomenon can be attributed to the long-range transport of O3 and O3 precursors within the polluted air masses originating from the northern regions, alongside photochemical formation under dry and sunny weather conditions7,35. Additionally, the first principal component (PC1), which corresponds to the temporal evolution of the primary mode, is computed from Eq. (1) of Methods, and is illustrated in Fig. 1c and d. Notably, it displays trends that are akin to the average O3 concentrations over China during the same period (see Supplementary Fig. 1c, d). In fact, PC1 captures the predominant fluctuation region for O3 pollution in China, distinct from representing ozone fluctuations across all locations throughout the country.

The 2nd eigenvalues explain 11.3% and 11.7% of the total variance in summer and winter, respectively. The spatial distributions of the 2nd eigenvectors (Supplementary Fig. 2a and b) reveal a distinct boundary between northern and southern clusters, reflecting the influence of diverse weather systems on surface O3 fluctuations. For example, Northern China is susceptible to Rossby waves, which can trigger heat or cold waves originating from Siberia17,18, whereas Southern China is influenced by tropical circulations36.

Global SSTA linked to the surface O3 pattern in China

To elucidate the relationship between dominant surface O3 patterns and global climate, we construct a network based on Methods. The cross-correlation between PC1 and the global SSTA time series is computed based on Eq. (2) (refer to Methods). To conduct significance tests, we generate synthetic correlation using two different null models: the White Noise Null Model (WNNM) and the Monthly Autocorrelation-preserving Null Model (MANM). In WNNM, the daily SSTA time series and PC1 is completely shuffled, resulting in a typical white noise time series. On the other hand, MANM preserves the order within a one-month time window, corresponding to the typical timescale of SSTA memory, and then shuffles the order of different time windows. This MANM test maintains the autocorrelation within a month but eliminates the true correlation between the two-time series. Supplementary Table 1 shows the reliability of the MANM test effectively identifying spurious correlations in comparison to the conventional T-test10,27,37.

Figure 1e, f shows the Probability Density Functions (PDFs) of the absolute correlation \(| C{C}_{{v}_{1},{s}_{i}}^{* }|\) for the observed data (depicted in red), in contrast to WNNM (in green) and MANM (in blue). It is evident that the correlations for MANM are considerably larger than those of WNNM, which can be attributed to the influence of autocorrelation. However, the peaks of the PDFs are distinct between the observed and the null models for both summer and winter, as depicted in Fig. 1e and f. The observed data exhibits significantly larger correlations. To eliminate spurious correlations, we employ the 97.5% significance level of the MANM test as the threshold Δ (refer to Fig. 1e and f). Correlations below this threshold do not establish a link between PC1 and SSTA nodes.

Figure 2a and b illustrate the spatial distributions of significant correlations \(C{C}_{{v}_{1},{s}_{i}}^{* }\) exceeding the threshold across global SSTA for summer and winter, respectively. The regions with positive and negative correlations between PC1 and SSTA nodes are represented in red and blue, respectively. Certain regions exhibit robust correlations in both seasons (above 0.5 or below − 0.5). The associated delay times are derived as outlined in the Methods section and are depicted in Fig. 2c and d. Notably, some SSTA regions have delay times exceeding 90 days, suggesting long-term memory behavior between these SSTA regions and surface O3 in China, and the potential for predicting ozone levels more than one season in advance. To pinpoint critical SSTA regions, we identify the critical SSTA nodes where \(| C{C}_{{v}_{1},{s}_{i}}^{* }| > {{\Delta }}\) and τ* > 90. We select the four largest clusters formed by spatially contiguous critical SSTA nodes, as shown in Fig. 2e and f. These clusters are teleconnected to surface O3 in China over distances exceeding 3000 km, with the exception of cluster CS3 in Fig. 2e, which is proximate to Eastern China.

Fig. 2: Global SSTA linked to the surface O3 pattern in China.
figure 2

Distributions of significant correlation coefficients over global SSTA nodes linked to PC1 of surface O3 in China for the (a) summertime and b wintertime, respectively. Distributions of the corresponding delay times to the correlation coefficients for the (c) summertime and d wintertime. Four clusters of SSTA connected to the surface O3 with the largest areas and long-term delay times more than 90 days for the (e) summertime and f wintertime, respectively.

Variations in SSTA, exhibiting inertial memory over months to years, can induce anomalies in large-scale atmospheric circulation38,39. These atmospheric circulation changes, in turn, could have repercussions on air pollution and surface O3 levels in China40,41. To delve into the underlying physical mechanisms associated with the identified SSTA clusters, we present the most relevant atmospheric circulation indexes corresponding to the identified SSTA clusters in Fig. 2e and f. Specifically, the summer clusters CS1, CS2, CS3, and CS4 are linked to the Walker circulation (\({{{\rm{Ni}}}}\tilde{n}{{{\rm{o}}}}\) 1+2)42, the North Pacific High (NPH)43, the West Pacific Subtropical High (WPSH)44, and the Pacific-North American teleconnection pattern (PNA)45, respectively (see Fig. 3a). The significant correlation coefficients above 0.5 between the atmospheric circulation indexes and the corresponding SSTA clusters are shown in Table 1. Furthermore, these atmospheric circulation indexes exhibit significant correlations with PC1 of surface O3 pollution over months, as well as with the SSTA clusters (see Table 1).

Fig. 3: Atmospheric circulations related to SSTA clusters.
figure 3

Sketch maps of four atmospheric circulations corresponding to four SSTA clusters, influencing PC1 of surface O3 pollution in China during (a) summer and b winter.

Table 1 Relationships between the atmospheric circulation indexes and the corresponding SSTA clusters and PC1 of surface O3 pollution in months.

Supplementary Figure 3 illustrates the time-delayed average atmospheric field anomalies following strong or weak atmospheric circulations. In Supplementary Figure 3a, the weak Walker circulation induces a high-pressure anomaly over the Western Pacific Ocean and Eastern Asia. Similarly, the weak NPH, the strong WPSH, and the strong PNA also lead to high-pressure anomalies in north China (see Supplementary Fig. 3c, e, g) associated with lower humidity, elevated temperatures, reduced cloud cover, and intensified solar radiation. These conditions create favorable environments for photochemical reactions that generate surface O316,20,21,22,23,46,47. Conversely, the opposite atmospheric field anomalies are associated with low-pressure anomalies and low levels of surface O3 pollution in Supplementary Fig. 3b, d, f, h.

In winter, clusters CW1, CW2, CW3, and CW4 are associated with the Southern Oscillation Indices (SOI)48, the North Atlantic Oscillation (NAO)49, the Amundsen Sea Low (ASL)50, and the Madden-Julian Oscillation (MJO)51, as depicted in Fig. 3b. The correlation coefficients between them are presented in Table 1. During winter, Southern China emerges as a dominant contributor to surface O3 pollution, indicating mechanisms distinct from those in summer. SOI is negatively correlated with cluster CS1 and PC1 (see Table 1). Following the negative phase of SOI, East Asia experiences a strong north-to-south wind anomaly transporting O3 pollution and O3 precursors from northern regions (see Supplementary Fig. 4a). During the negative phase of NAO, ASL and MJO exhibit atmospheric fields with a similar wind pattern facilitating the transport of O3 pollution (Supplementary Fig. 4c, e, g). Conversely, for the opposite phases, the atmospheric fields are depicted in Supplementary Fig. 4b, d, f, h. It is noteworthy that these atmospheric circulation indexes are interdependent; for instance, ASL can influence MJO and SOI, subsequently affecting O3 pollution in China52,53,54,55.

Seasonal prediction of the surface O3 pattern in China

To assess the predictive power of these SSTA clusters for surface ozone in China, we employ a multiple linear regression model. The model, based on a combination of these clusters, predicts the time series of the first principal component (PC1) with a fifteen-day moving average for summertime and wintertime surface O3 in China beyond 90 days. Using a training dataset from June 2013 to February 2018, the model is obtained by Eq. (4) (see Methods). The R-value R0 between predicted and observed data are 0.89 and 0.81 for summer and winter, respectively (Fig. 4a, b). For the testing dataset from June 2018 to February 2023, the R-value R1 is lower than those of the training dataset but remains around 0.5 for both seasons (Fig. 4a, b). To evaluate the significance of the model’s predictive performance, we conduct the MANM test by shuffling the time series of SSTA nodes within each cluster. The significance level of R1 for the prediction model exceeds the 97.5th percentile compared to the null model (Fig. 4c, d). Supplementary Fig. 5 presents the absolute correlation coefficients between each SSTA cluster and PC1 for each year within the testing periods, which reveal variations in predictive power across different years and for different SSTA clusters.

Fig. 4: Seasonal prediction of the surface O3 pattern in China.
figure 4

Prediction of time series of PC1 (with the fifteen-days moving average) for the surface O3 in China based on the multiple linear regression modeling for a summertime and b wintertime, respectively. The blue shadow area shows the prediction for the testing dataset. Black dotted line represents the real PC1 time series with the 15-day moving average. Significance test of R-value for the prediction of the testing dataset for c summertime and d wintertime, respectively. The red area is above the 97.5th percentile.

Discussion

In this study, we explored the realm of long-term surface ozone (O3) pollution prediction in China. Initially, we employed eigen techniques to extract dominant principal modes characterizing China’s summer and winter O3 pollution patterns. Notably, the spatial distribution corresponding to the first mode exhibits consistent patterns, with critical regions suffering high levels of O3 pollution dominating this mode, a distinction apparent between summer and winter. Conversely, the spatial distribution associated with the second eigenvalue reveals a distinct boundary between northern and southern clusters, driven by differing meteorological conditions in Northern and Southern China.

Furthermore, we calculated cross-correlations with time delays between PC1 and the SSTA time series. This analysis highlights four crucial SSTA clusters that significantly influence PC1 of O3 pollution. Our findings reveal that summer O3 pollution is linked to the Walker circulation, the North Pacific High, the West Pacific Subtropical High, and the Pacific-North American teleconnection pattern corresponding to the four SSTA clusters. These atmospheric circulations with anomalies can create favorable environments for photochemical reactions that generate surface O3 pollution. Winter O3 pollution is associated with the Southern Oscillation, the North Atlantic Oscillation, the Amundsen Sea Low, and the Madden-Julian Oscillation.

To enhance predictive capabilities, we proposed a statistical model to forecast the first principal component of O3 pollution in China for both summer and winter seasons. This model is based on PC1 and its association with the states of the identified SSTA clusters, with a lead time of at least 3 months. With the training dataset, our model demonstrated high prediction accuracy, achieving R-values of 0.89 and 0.81 for summer and winter, respectively. In the testing dataset, the R-values remains close to 0.5 for both seasons. The performance of our prediction model indicates its proficiency in capturing general trends in the time series, despite its limitations in predicting short-term fluctuations.

The findings from our study can assist communities in anticipating climate conditions affecting ozone pollution one season in advance, allowing for the implementation of emission control measures to minimize the adverse impacts of deteriorating air quality due to unfavorable climate conditions.

Methods

Data

The study utilizes the maximum daily 8 h average (MDA8) surface O3 concentration dataset (version 2) over China from June 2013 to February 2023. The dataset, with a grid resolution of 0. 1 × 0. 1, is sourced from the Tracking Air Pollution in China (TAP). This dataset merges ground measurements, satellite retrievals, chemical transport model, land-use information with machine learning models and meteorology fields, ensuring its reliability56,57. For each grid j, a time series {xj(t)} of O3 concentration is available. This study focuses on summertime (June-July-August) and wintertime (December-January-February). The global daily average Sea Surface Temperature (SST) data from the ERA5 Reanalysis58, with a resolution of 0.25 × 0.25 from 1991 to 2023.

Data detrending

The O3 concentration dataset is divided into two periods: (1) June 2013 to February 2018, serving as the training period for model fitting; (2) June 2018 to February 2023, serving as the testing period to assess the model’s reliability. The mean seasonal cycle (calculated based on the training period) for each calendar day is subtracted from the O3 data for both periods. For SST, the data from 1991 to 2023 is divided into three parts (1991–2000, 2001–2010 and 2011–2023) to obtain the SSTA. The first two parts are based on the entire decade data to remove the seasonal cycle, while the last part is based on the data of the first 7 years (excluding the year corresponding to the test set).

Principal modes of O3 fluctuation

The detrended O3 field over China, comprising N grids and a time length of M, can be represented as a matrix \({{{\boldsymbol{Y}}}}={({{{{\boldsymbol{y}}}}}_{1},{{{{\boldsymbol{y}}}}}_{2},\ldots ,{{{{\boldsymbol{y}}}}}_{t},\ldots ,{{{{\boldsymbol{y}}}}}_{M})}^{T}\), where yt = (yt1, yt2, …, ytN) represents the state of system at time t28. The covariance matrix C = YTY/M is constructed, and its eigenvalues λ1 ≥ λ2 ≥λN ≥ 0 and eigenvectors u1, u2, …, uN are obtained by solving the eigen equation Cu = λu. All eigenvectors are normalized and orthogonal to each other. The projection of the fluctuation field Y onto the n-th eigenvector Y onto the n-th eigenvector \({{{{\boldsymbol{u}}}}}_{n}={({{{{\boldsymbol{u}}}}}_{n1},{{{{\boldsymbol{u}}}}}_{n2},\ldots ,{{{{\boldsymbol{u}}}}}_{nN})}^{T}\) results in the n-th principal component:

$${{{{\boldsymbol{v}}}}}_{n}={{{\boldsymbol{Y}}}}{{{{\boldsymbol{u}}}}}_{n},$$
(1)

whose elements vtn, t = 1, 2, …, M (it also can denote as a time series {vn(t)}). The contributions of principal components to the system’s evolution are sorted in decreasing order, with the first principal component being the most crucial.

Network constructing

In our network, the nodes are defined as the principal components of O3 fluctuation (with a primary focus on the first mode) and SSTA grids59. The links, on the other hand, represent the connections between them. For a given SSTA grid i, the SSTA time series is denoted as {si(t)}. To quantify the relations between the PC1 of O3 fluctuation v1 and the SSTA time series, we calculate the cross-correlation function60,61,62,63:

$$C{C}_{{v}_{1},{s}_{i}}(\tau )=\frac{\left\langle ({v}_{1}(t)-\bar{{v}_{1}})\cdot ({s}_{i}(t-\tau )-\bar{{s}_{i}})\right\rangle }{\sqrt{\left\langle {\left[{v}_{1}(t)-\bar{{v}_{1}}\right]}^{2}\right\rangle }\cdot \sqrt{\left\langle {\left[{s}_{i}(t-\tau )-\bar{{s}_{i}}\left.\right)\right]}^{2}\right\rangle }},$$
(2)

where \(\bar{{v}_{1}}\) and \(\bar{{s}_{i}}\) represent the averages over the periods; 0≤τ≤365 is the delay time. When considering either summertime or wintertime, the variable t spans intermittent periods of summer and winter across various years. Specifically, for summertime, the range is 365(yy − 1) + 152 ≤ t ≤ 365(yy − 1) + 243, for wintertime, it is 365(yy − 1) + 335 ≤ t ≤ 365yy + 59, where yy represents each respective year index from the initial value 1. We determine the maximum absolute value of the cross-correlation as \(C{C}_{{v}_{1},{s}_{i}}^{* }\), and record the corresponding delay time as τ*. When τ* > 90, the SSTA grid can provide the potential prediction power more than one season in advance. Thus, a network link is defined when \(| C{C}_{{v}_{1},{s}_{i}}^{* }| > {{\Delta }}\) and τ* > 90; otherwise, there is no link between v1 and si. Δ is the threshold to select the significant correlation, which we set as the 97.5% significant level of the MANM test (see Fig. 1e and f).

Multiple linear regression model

In this study, we construct a multivariate linear model by combining correlated clusters identified in previous steps. For SSTA cluster Cj, the time series of the weighted averages of SSTA is denoted as:

$$W{S}_{j}(t)=\frac{{\sum }_{i\in {C}_{j}}{s}_{i}(t)\cdot | C{C}_{{v}_{1},{s}_{i}}^{* }| }{{\sum }_{i\in {C}_{j}}| C{C}_{{v}_{1},{s}_{i}}^{* }| }.$$
(3)

The multiple linear regression model is then constructed as follows:

$${v}_{1}^{{\prime} }(t)=\mathop{\sum }\limits_{j=1}^{{N}_{c}}{a}_{j}\cdot W{S}_{j}(t-{\tau }_{j}^{* })+b,$$
(4)

where \({v}_{1}^{{\prime} }(t)\) represents the predicted PC1, Nc is the number of selected clusters, and \({\tau }_{j}^{* }\) is the delay time exceeding 90 days between PC1 and WSj, identified based on Eq. (2) for the training set. The parameters aj (j = 1, …, Nc) and b are fitted using five-year time window data and remain fixed for predicting PC1 in the following year. As the prediction extends to subsequent years, the parameters are updated based on the past five years.