Abstract
This study presents a novel ensemble of surface ozone (O3) generated by the LEarning Surface Ozone (LESO) framework. The aim of this study is to investigate the spatial and temporal variation of surface O3. The LESO ensemble provides unique and accurate hourly (daily/monthly/yearly as needed) O3 surface concentrations on a fine spatial resolution of 0.1◦ × 0.1◦ across China, Europe, and the United States over a period of 10 years (2012–2021). The LESO ensemble was generated by establishing the relationship between surface O3 and satellite-derived O3 total columns together with high-resolution meteorological reanalysis data. This breakthrough overcomes the challenge of retrieving O3 in the lower atmosphere from satellite signals. A comprehensive validation indicated that the LESO datasets explained approximately 80% of the hourly variability of O3, with a root mean squared error of 19.63 μg/m3. The datasets convincingly captured the diurnal cycles, weekend effects, seasonality, and interannual variability, which can be valuable for research and applications related to atmospheric and climate sciences.
Similar content being viewed by others
Background & Summary
Surface ozone (O3) pollution is a global concern due to its detrimental effects on public health1 and food security2. Surface ozone (O3), also known as ground-level O3 (up to roughly 3 km above the Earth’s surface), is formed through chemical reactions in the troposphere between volatile organic compounds (VOCs) and nitrogen oxides (NOx) in the presence of sunlight3. According to the latest global air quality guidelines (AQG-20214), the recommended level for the average of daily maximum 8-hour mean O3 concentration is 100 μg/m3. Long-term exposure to elevated levels of O3 has been found to result in the development of cardiovascular and respiratory diseases, as well as a decline in lung function5. From 2014 to 2021, the daily maximum 8-hour mean O3 concentration in Beijing consistently exceeded 100 μg/m3 during the months of April to August, with the highest concentration observed in June (~152 μg/m3)6. In addition, episodes of O3 pollution hinder the growth of plants and the accumulation of biomass, consequently leading to a decrease in crop yield7,8. Meanwhile, the connection between surface O3 and climate change has garnered considerable attention in academic discourse9,10,11.
The community has made significant progress in estimating regional surface O3 concentrations by integrating ground-based site measurements with satellite remote sensing12,13. However, the majority of these studies have focused on the daily surface O3 levels over China. To better analyze the spatial and temporal variability of surface O3 on a broader scope, it is valuable to generate a comprehensive ensemble of surface O3 concentrations that encompasses various hotspot regions worldwide. These datasets will not only contribute to an enhanced understanding of ecosystem resilience to climate change but also provide recommendations for globally coordinated O3 regulation.
The LEarning Surface Ozone (LESO)6,14 is a subset of the Learning Air Pollutants from Satellite Observations (LAPSO)15 system that employs advanced deep learning techniques to integrate multi-source datasets and infer spatial and temporal variability of air pollutants. The primary objective of LESO is to improve our understanding of the interactions between the atmospheric environment and human activities6. We used the state-of-the-art deep forest method16,17 to establish a relationship between the ground-based O3 measurements and satellite observations, as well as meteorological reanalysis records. The deep forest method was suggested because it yielded more accurate estimation of O3 concentrations, with an approximate increase of 30% in accuracy, as compared to conventional machine learning techniques such as shallow-layer neural networks and decision trees (e.g., multiple layer perceptron and random forest)14. The trained functions between the input variables (satellite observations and meteorological parameters) and output variables (surface O3 measurements) were subsequently applied to produce gridded estimates of O3. As most abundances are concentrated in the stratosphere, the signal of O3 in the lower troposphere observed by nadir-viewing satellites is rather weak18,19. A comprehensive analysis using multiple satellite data sources has indicated that the application of deep learning techniques can achieve reliable and consistent estimation outcomes14. This capability enables the utilization of the vast potential of existing satellite data to derive surface ozone with high resolution and extensive coverage.
For this purpose, we adopted the LESO estimation framework to generate surface O3 data products for a period of 10 years (2012–2021) in three regions: the Chinese mainland (abbreviated as “China” hereafter), Europe, and the United States (US), including nearly 30 countries in total. The data was obtained at hourly temporal and 0.1° × 0.1° spatial resolutions. The LESO ensemble possesses the capability to investigate the long-term spatiotemporal characteristics of surface O3 concentrations across a wider geographical range than any other currently available datasets. In addition to the statistical validation, the LESO surface O3 datasets were assessed in four scenarios:
-
O3 variability during rush hours: O3 in the troposphere is formed by the photochemical reaction involving nitrogen oxides (NOx) that are commonly emitted from combustion exhaust20.
-
O3 weekend effect21: Higher O3 concentrations are typically observed on weekends in urban areas22.
-
O3 seasonality: O3 pollution events tend to occur in spring and summer when the solar radiation is strong23.
-
O3 interannual variability: This can be a result of regulatory policies and/or major social incidents, e.g., the implementation of lockdown measures during the COVID-19 pandemic24.
Methods
Deep-learning model training and validation
Figure 1 illustrates the main procedures involved in generating and validating the LESO ensemble. The generation of the LESO ensemble relies on deep learning algorithms extracting the nonlinear relationship between surface O3 measurements obtained from in-situ environmental monitoring sites and the corresponding satellite/climate data at the same location. The workflow consists of four major steps: data collection, model setup and validation, dataset production, and assessments. The deep learning method considered in this study is the DF21 (Deep Forest v2021.2.117) model, which is characterized by its cascading decision forests structure (refer to Fig. 1). The DF21 model was trained and validated using in-situ O3 measurements obtained from local environmental agencies in the three regions (China, Europe and the US). The independent variables driving the DF21 model, as shown in Fig. 1, are the satellite-derived O3 total columns obtained from the Ozone Monitoring Instrument (OMI)25 and the meteorological parameters derived from the fifth-generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis of the global climate (ERA5)26. The ERA5 meteorological parameters included shortwave solar radiation, vertical profiles of temperature, relative humidity, wind, U-/V- wind components, rain water content, and O3 mixing ratio (see Fig. 1). The quality of the corresponding data has been evaluated through the use of independent in-situ/satellite-based datasets and global models27. The satellite-derived total columns provide an overview of spatial distribution of in the atmosphere, whereas the ERA5 data products enhance our understanding of the impact of meteorological conditions on the physio-chemical processes involved in the formation and behavior of atmospheric O3. In addition, we have analyzed the estimation performance using data from the TROPOspheric Monitoring Instrument (TROPOMI) on board the Sentinel-5P satellite, which serves as an independent verification. Further details can be found in the subsequent section. We utilized a total of 4821 in-situ environmental monitoring sites, comprising 1628 sites from the China National Environmental Monitoring Center (CNEMC), 1866 sites from the European Environmental Agency (EEA), and 1327 sites from the Environmental Protection Agency (EPA). The maintenance of data quality for these in-situ measurements is the responsibility of the respective data provider. Please refer to the provided links for more information:
-
OMI O3 data28: https://acdisc.gesdisc.eosdis.nasa.gov/data/Aura_OMI_Level3/OMDOAO3e.003/.
-
TROPOMI O3 data29: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_NRTI_L3_O3.
-
ERA5 global meteorological reanalysis30: https://doi.org/10.24381/cds.143582cf.
-
in-situ measurements in China (CNEMC)31: https://air.cnemc.cn:18007.
-
in-situ measurements in Europe (EEA)32: https://www.eea.europa.eu/themes/air/explore-air-pollution-data.
-
in-situ measurements in the US (EPA)33: https://www.epa.gov/outdoor-air-quality-data.
In this study, the TROPOMI O3 total columns were taken from the near-real-time (NRTI) product, which has a high level of data quality and demonstrates consistent reliability when compared to the offline (OFFL) product34,35. The prompt availability of the NRTI O3 product further highlights its notable advantage in terms of timeliness. It is important to acknowledge that the training process of the DF21 model requires the optimization of fine-tuning parameters. For more detailed information, please refer to the corresponding documents6,15,17.
The DF21 model was validated for all in-situ sites using the Leave-One-Out Cross-Validation (LOOCV) approach that is reliable when dealing with small datasets36. A total of 4821 validations were conducted for the DF21 model. In each validation, the data from one site was used as the training dataset, while the data from all other sites were used as the test dataset. The gridded feature data were interpolated at the site level using the inverse distance weighting method37. To synchronize with the in-situ measurements and ERA5 meteorological data, the daily satellite O3 total columns were linearly interpolated to the hourly timescale in the temporal dimension. The interpolation process involved the identification of “good” pixels based on QA flags and a cloud fraction below 10%. The LESO framework distinguishes itself from other data-driven estimation methods12,13,38 by adopting the dynamical networking technique6, which incorporates data from nearby sites to train the model. This technique makes it possible to mitigate the effects of uncertainties arising from factors like topography and regional climatic conditions39,40. To assess the performance of the validation, we used the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and mean bias error (MBE).
Surface O3 datasets production and assessment
The trained DF21 model was employed to generate datasets of intercontinental surface O3. The production process entailed incorporating the model with 10-year gridded feature data, including the satellite-derived O3 Level-3 total columns data and ERA5 meteorological reanalysis data. The datasets were generated at four distinct temporal resolutions: hourly, daily, monthly, and yearly. The spatial resolution of the datasets was 0.1° × 0.1°, whereas satellite column densities and meteorological reanalysis had a spatial resolution of 0.25° × 0.25°. This improvement in resolution has proven to be feasible through a comparative analysis of O3 estimations between OMI and TROPOMI over the period of 2019 to 2021. We noticed that using the utilization of OMI data (0.25° × 0.25°) consistently yielded comparable estimation results when compared to the utilization of TROPOMI data (0.1° × 0.1°). The details of this experiment are presented in Section “Technical Validation”. Besides, our previous work14,15 and Figure S4 in “Supplementary Information” have demonstrated that the variability in O3 total columns derived from different satellites is deemed to be statistically insignificant. The OMI data seems more advantageous due to the fact that TROPOMI, which was launched in October 2017, only offers Level-3 data products for dates subsequent to late 201819,34,35. Consequently, the spatiotemporal resolution of the LESO datasets can be adjusted to match that of ERA5.
The LESO ensemble was developed based on our previous short-term regional datasets6,14 and has undergone substantial revision and enhancement, resulting in datasets that offer a greater level of spatial and temporal detail, spanning a period of more than a decade (not described elsewhere). An extensive validation of the LESO datasets has been conducted from three aspects. Firstly, the spatial distribution of LESO surface O3 was compared to ground-level measurements and existing literature. Secondly, the ability of the LESO datasets to accurately replicate widely recognized temporal variation patterns of surface O3 was assessed. Lastly, the effectiveness of the LESO datasets in characterizing spatiotemporal distributions of O3 was examined. The second and third validations were performed to acquire a deeper insight of the data quality of the LESO ensemble, as it is essential for a reliable model dataset to accurately depict the spatiotemporal variations of O3 in the real world. This study focuses on the temporal variation patterns as follows:
-
Spatial variability: We analyzed if the LESO datasets can reproduce the elevated O3 concentrations during the summer season in eastern China (e.g., the Beijing-Tianjin-Hebei region)41, southern Europe (e.g., Spain and Italy)42, and the western US (e.g., California)43.
-
Diurnal variability: We examined the impact of urban road traffic regulations on the nitrogen precursor of O3, which is primarily sourced from the transportation sector44. The concentration of surface O3 is expected to reach its highest level a few hours after the morning rush hour, typically in the mid to late afternoon. This delay is attributed to the time required for photochemical reactions to generate O345.
-
O3 weekend effect: It refers to the phenomenon where the maximum hourly O3 levels during weekends can have a decrease of up to 15% compared to weekday levels, or an increase of up to 15%. This effect is believed to be caused by the reduction of nitrogen oxides (NOx) in a VOC-limited O3 formation regime46.
-
Seasonality and long-term trends: We analyzed the surface O3 variations in response to reduction policies, such as the plan implemented in 2017 by China47), as well as the impact of the COVID-19 pandemic since the end of 201924,48.
Data Records
The LESO ensemble comprises the surface O3 datasets over the three regions (see Table 1 for the corresponding geographical range). In the context of this study, the term “China” specifically pertains to the geographical area of the mainland. The term “Europe” stands for the region including a total of 25 countries, namely Albania, Andorra, Austria, Belgium, Bosnia and Herzegovina, Croatia, Czechia, Denmark, France, Germany, Hungary, Ireland, Italy, Luxembourg, Montenegro, Netherlands, Norway, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden, Switzerland, and the United Kingdom of Great Britain and Northern Ireland. The term “US” denotes the geographical area of the continental United States. The datasets were generated at a spatial resolution of 0.1° × 0.1° and across four timescales, i.e., hourly, daily, monthly, and yearly. All of the LESO datasets are available in Zenodo under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.:
-
Hourly O3 measurements in China49: https://doi.org/10.5281/zenodo.7500780.
-
Hourly O3 measurements in Europe50: https://doi.org/10.5281/zenodo.7500782.
-
Hourly O3 measurements in the US51: https://doi.org/10.5281/zenodo.7500784.
-
Daily, monthly, and yearly O3 measurements in all regions52: https://doi.org/10.5281/zenodo.7502204.
The data files are organized based on region and timescale using the Network Common Data Form, version 4 (NetCDF-4) format, following the naming convention outlined in Table 1. As an example, a file named “SUR-O3-2012-01-03-DF21-01.nc” in the “EU-SUR-O3-OMI-Hourly” directory stores the hourly measurements of surface O3 concentrations (version 01) on January 3, 2012 derived from the OMI instrument using the DF21 model (https://deep-forest.readthedocs.io/en/stable/). The open access in-built processing tool allows for dynamical of the estimation uncertainty at user-defined geolocations53. To spatially extrapolate the site-level uncertainties to the regions of interest, we employed a geographically weighted regression technique.
Technical Validation
Statistical validation
The LOOCV results for the LESO ensemble, using a total of 4821 in-situ sites, demonstrated excellent performance. Please refer to Fig. 2 and Table 2 for a summary. The mean values of R2 and RMSE were 0.78 and 12.84 μg/m3, respectively, indicating a strong correlation and relatively small deviation between the predicted and observed measurements. The average site-level concentration of surface O3 in China, Europe, and the US during the summer months was 75, 67, and 65 μg/m3, respectively. In contrast, during the winter months, the average concentrations were 40, 39, and 51 μg/m3 in China, Europe, and the US, respectively. From the hourly timescale to the monthly timescale, the R2 values for the first quartile ranged from 0.67 to 0.80, while the RMSE for the third quartile ranged from 11.71 to 20.30 μg/m3. As expected, the validation results were superior at coarser timescales compared to finer timescales, supporting by a higher level of explained O3 variation (R2) and a lower magnitude of estimation errors (RMSE). The R2 value for the hourly timescale (~0.76) was found to be lower than those for the daily, weekly, and monthly timescales by 4.44%, 8.91%, and 11.52%, respectively. The RMSE for the hourly timescale (~16.93 μg/m3) was observed to be higher than those for the daily, weekly, and monthly timescales by 40.93% 67.16% and 86.19%, respectively. The interquartile range (IQR) values of R2 for the hourly, daily, weekly, and monthly timescales were 0.20, 0.19, 0.17, and 0.15, respectively. The IQR values of RMSE for the hourly, daily, weekly, and monthly timescales were 7.19, 6.71, 6.69, and 6.52 μg/m3, respectively.
The validation results showed a similar level of performance (R2 and RMSE) between China and Europe, but the results in the US were slightly inferior, particularly when considering the hourly timescale. On average, across the four timescales, the mean and IQR of R2 were 0.82 and 0.16 in China, 0.83 and 0.18 in Europe, and 0.77 and 0.22 in the US. The mean and IQR values of RMSE in China were 13.39 and 8.16 μg/m3, respectively. In Europe, the mean and IQR values were 11.24 and 8.40 μg/m3, respectively, while in the US, they were 11.38 and 8.76 μg/m3, respectively. The R2 in the US was 7% lower than that in China and Europe. However, the corresponding RMSE in the US was 15% lower than that in China. In China, it was observed that there was a positive correlation between the R2 and RMSE, while in the US, the opposite trend was observed. Factors causing this correlation might be in relation to the distribution of in-situ sites and the O3 formation mechanism. The average O3 concentration over multiple years in China, Europe, and the US was 60, 53, and 61 μg/m3, respectively. According to Fig. 1, the in-situ sites in China were primarily situated in heavily O3-polluted areas (mostly in eastern China)6,54,55, whereas many sites in the US were located in regions with low O3 levels, such as the east coastal area56. In addition, the O3 formation pathway varies significantly between the eastern and western areas of the US14. The notable difference in O3 concentration levels between the eastern and western regions of the US can be attributed to the transport of O3 from the stratosphere to troposphere, a phenomenon known as the stratospheric intrusions57. The intrusions usually occur in relatively high-latitude areas like the western part of the US58. In contrast, the O3 formation pathway in China was rather consistent between areas59. This may explain why LESO produced lower R2 and lower RMSE in the US. The validation outcome for hourly surface O3 formation in the US was arguably promising, considering its complexity. Nevertheless, further validations are required to analyze the spatiotemporal variation characteristics for justifying the long-term reliability of LESO.
Validation of temporal variability
Tropospheric O3 is a secondary air pollutant formed from photochemical reactions60:
where VOCs and NOx refer to volatile organic compounds and nitrogen oxides, respectively, hv represents the strength of SSR. Figure 3a demonstrates that the three regions experienced varied interannual solar shortwave radiation (SSR) from 2014 to 2021, which can be associated with the variability of surface O361,62,63. In China, the SSR difference between 2014 and 2021 was smaller than 100 Wm−2, and higher SSR values were found during 2017–2019. In Europe, the SSR exhibited an overall upward trend, and the SSR difference between these years was also smaller than 100 Wm−2. In the US, the SSR difference during the same period reached 110 Wm−2, and an overall downward trend was seen from 2014 to 2019. The SSR in the US increased rapidly since 2019, particularly reaching 1214 Wm−2 in 2021. Figure 3b compares the LESO ensemble of surface O3 between 2019 and 2021 in the three regions at a spatial resolution of 0.1° × 0.1° using the two products of total O3 from OMI and TROPOMI, respectively. The R2 between the OMI-based and TROPOMI-based surface O3 concentrations was 0.9, and the slope of linear regression was 1.1. The surface O3 estimates based on TROPOMI were 10% higher than those based on OMI. Providing higher spatial resolution and longer operational period, the next version of LESO will consider the TROPOMI O3 data for future long-term use.
The LESO ensemble was validated by analyzing the temporal variation characteristics of surface O3 at four time scales: diurnal cycles (Fig. 4a), weekend effect (Fig. 4b), seasonality (Fig. 4c), and interannual variations (Fig. 4d). The estimated surface O3 concentrations were consistent with the ground-based measurements with a mean difference of less than 1 μg/m3. Both datasets showed strong diurnal patterns caused by urban commutes64: the peak values were observed at 3 PM, while the trough values occurred between 6 to 9 AM. The same findings have bee confirmed by the relevant literature65,66. The LESO ensemble accurately reproduced the peak and trough values of surface O3 in the regions of China and Europe, but slightly overestimated the values in the early morning and underestimated them at noon in the US.
The estimated and measured surface O3 showed significant weekend effects in all three regions, with higher O3 values on weekends compared to weekdays (see Fig. 4b), which has been discussed in previous studies22,44,46. The estimated daily average O3 values were slightly smaller than the ground-based measurements. The underestimation in China and Europe was approximately 1 μg/m3, whereas in the US it was less than 0.05 μg/m3. The stronger weekend effects were seen in Europe (~2.51 μg/m3) than in China (~0.09 μg/m3) and the US (~1.41 μg/m3).
The existing literature20,67 suggests that higher surface O3 concentrations were seen in summer than in winter. This seasonal variability of surface O3 was identified in the three regions from both the LESO datasets and ground-based measurements. The highest O3 concentration in China, Europe, and the US occurred in April, May, and June, respectively. The monthly difference between the measured and estimated O3 was less than 1 μg/m3 in the three regions.
Figure 4d confirms that the LESO datasets can reconstruct realistic interannual variations of O3. The difference between the yearly measured and estimated O3 levels in China and Europe was 0.92 and 1.06 μg/m3, respectively, whereas it was only 0.17 μg/m3 in the US. The existing literature54,68 has highlighted an increasing trend in surface O3 over China, which can be attributed to the rapid urbanization and industrialization progress (e.g., increased combustion and industrial pollutant emissions). As illustrated in Fig. 4d, surface O3 concentrations in China experienced a rapid increase from 2014 to 2018, with an annual growth rate of 3.12 μg/m3. However, between 2018 and 2021, there was a downward trend in O3 concentrations, which was likely due to the implementation of regulations by the authorities to address air pollution issues41,69,70, as well as the lockdowns imposed during the COVID-19 pandemic71,72,73,74. O3 concentrations in Europe exhibited an overall increasing trend, with an annual growth rate of 0.48 μg/m3. These interannual variation patterns of O3 were consistent with those of SSR shown in Fig. 3a. Because Europe has not implemented stringent measures to reduce O3 pollution since the Gothenburg Protocol in 201275, the intensity of solar radiation can play a significant role in interacting with O3 variations20. Figure 4d shows that the time series of O3 in the US was generally stable, except for the sudden change in 2021. Likewise, the relevant authority in the US has not issued any other comprehensive reduction plan apart from the Clean Air Act Amendments of 199076. The variations of SSR (see Fig. 3a) may largely contribute to the observed trend of surface O3 in the US.
Validation of spatiotemporal distribution
Figure 5 illustrates the yearly mean of surface O3 from the LESO ensemble in the three regions from 2012 to 2021. In line with the previous studies77,78,79, high levels of O3 were found in the North China Plain, also known as “Jing-Jin-Ji” area. O3 concentrations in Europe were low in both spatial and temporal domains. As compared to northern Europe, severe O3 pollution was found in southern Europe, which can be caused by the latitudinal distribution of solar radiation20,80,81,82. In the US, O3 concentrations were generally stable, with relatively low levels observed between 2016 and 2019. The spatial distribution of O3 differed considerably between the western and eastern parts of the US. Specifically, the western part consistently exhibited higher O3 concentrations than the eastern part throughout the years. This spatial discrepancy agrees with the earlier relevant findings57,58 and may be a result of the stratospheric intrusions. Regarding the difference in interannual variations between the site-level estimates (see Fig. 4d), the possible factors are: (1) the number of sites varied throughout the years, and (2) the sites were located mainly in highly polluted areas6,14. The spatial distribution of O3 in China, as characterized by the LESO ensemble, agrees with the findings of recent studies83,84. Unfortunately, there are no other available data products for validating the LESO ensemble in Europe and the US.
Furthermore, the LESO ensemble was validated with the GEOS-Chem (GEOS: Goddard Earth Observing System) model85, the Community Multiscale Air Quality (CMAQ) model86, and the ECMWF Atmospheric Composition Reanalysis 4 (EAC4) model87. Both GEOS-Chem and CMAQ models have been extensively applied for simulating air pollutants88,89, and the EAC4 model was recently used for validating total ozone columns from TROPOMI90. Figure S1 shows that the GEOS-chem model seemed to significantly overestimate surface O3 in all the three regions, whereas the EAC4 model yielded lower estimates than the LESO ensemble. The LESO, CMAQ, and EAC4 model datasets exhibited similar spatial patterns of O3 in the US, and the CMAQ model captured more detailed spatial features in the western region of the US. The comparison between LESO, CMAQ, and EAC4 (see Figure S2) confirmed underestimated O3 concentrations found by the EAC4 model. Figure S3 shows the site-level validations of the EAC4 and CMAQ models using ground-based measurements. The median R2 values of the EAC4 model in all the regions were below 0.6, while the median R2 of the CMAQ model in the US was about 0.45. The validation results of the CMAQ and EAC4 models appeared to be worse than those of LESO (see Fig. 2). Due to the scope of this study, readers are kindly directed to “Supplementary Information” for a detailed elaboration.
Usage Notes
The LESO surface O3 datasets were generated separately for four timescales and three regions. As the data size varies greatly across different timescales, such as from 154 GB for the hourly timescale to 18.1 MB for the yearly timescale in China, we recommend that users download data based on their specific timescale of interest.
Since O3 column data from polar-orbiting satellites are typically provided at a daily or even coarser timescale, it is necessary to temporally interpolate the satellite data in order to generate hourly measurements of surface O3. The accuracy of ERA5 hourly meteorological data can be crucial for estimating diurnal variations of surface O3. According to the technical validation results, the LESO ensemble can accurately capture the hourly variability of surface O3. The LESO O3 ensemble in China and Europe has demonstrated greater reliability at the hourly timescale. However, we advise users to exercise caution when using the hourly dataset in the US. The LESO datasets at the other timescales showed similar regional results, indicating no need for additional caution.
The LESO surface O3 datasets were produced at the spatial resolution of 0.1° × 0.1°. In case users require datasets with a lower spatial resolution, please contact Songyan Zhu (szhu4@ed.ac.uk) or Jian Xu (xujian@nssc.ac.cn). The current LESO ensemble is publicly available for surface O3 measurements in China, Europe, and the US. The authors are also willing to test and apply the LESO framework to other regions of the world, provided that in-situ measurements are available.
Code availability
The scripts for processing and reading the LESO datasets are accessible on Github (https://github.com/soonyenju/LESO) under the MIT license. The tools and libraries, including Python v3.9, Numpy v1.20.3, Xarray v0.19.0, Pandas v1.3.3, Deep Forest v2021.2.1 (DF21), scigeo v0.0.13, and sciml v0.0.5, were used to build the LESO framework for generating datasets of surface O3 concentrations. The validation of LESO datasets was processed using scitbx v0.0.42 and scikit-learn v0.24.2.
References
Fuller, R. et al. Pollution and health: a progress update. The Lancet Planetary Health 6, e535–e547 (2022).
Feng, Z. et al. Ozone pollution threatens the production of major staple crops in East Asia. Nature Food 3, 47–56 (2022).
U.S. Environmental Protection Agency (EPA). Ground-level Ozone Basics https://www.epa.gov/ground-level-ozone-pollution/ground-level-ozone-basics (2023).
World Health Organization (ed.) WHO global air quality guidelines. Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide (Geneva: World Health Organization, 2021).
Zhang, J., Wei, Y. & Fang, Z. Ozone pollution: a major health hazard worldwide. Frontiers in Immunology 10, 2518 (2019).
Zhu, S. et al. LEarning Surface Ozone from satellite columns (LESO): A regional daily estimation framework for surface ozone monitoring in China. IEEE Transactions on Geoscience and Remote Sensing 60, 4108711 (2022).
Unger, N., Zheng, Y., Yue, X. & Harper, K. L. Mitigation of ozone damage to the world’s land ecosystems by source sector. Nature Climate Change 10, 134–137 (2020).
Zhu, S. et al. Investigating Impacts of Ambient Air Pollution on the Terrestrial Gross Primary Productivity (GPP) From Remote Sensing. IEEE Geoscience and Remote Sensing Letters 19, 1–5 (2022).
Meehl, G. A. et al. Future heat waves and surface ozone. Environmental Research Letters 13, 064004 (2018).
Zanis, P. et al. Climate change penalty and benefit on surface ozone: a global perspective based on CMIP6 earth system models. Environmental Research Letters 17, 024014 (2022).
Doherty, R. et al. Impacts of climate change on surface ozone and intercontinental ozone pollution: A multi-model study. Journal of Geophysical Research: Atmospheres 118, 3744–3763 (2013).
Wang, Y., Yuan, Q., Li, T., Zhu, L. & Zhang, L. Estimating daily full-coverage near surface O3, CO, and NO2 concentrations at a high spatial resolution over China based on S5P-TROPOMI and GEOS-FP. ISPRS Journal of Photogrammetry and Remote Sensing 175, 311–325 (2021).
Li, T. & Cheng, X. Estimating daily full-coverage surface ozone concentration using satellite observations and a spatiotemporally embedded deep learning approach. International Journal of Applied Earth Observation and Geoinformation 101 (2021).
Zhu, S. et al. Satellite-derived estimates of surface ozone by LESO: Extended application and performance evaluation. International Journal of Applied Earth Observation and Geoinformation 113, 103008 (2022).
Zhu, S. et al. Estimating near-surface concentrations of major air pollutants from space: A universal estimation framework LAPSO. IEEE Transactions on Geoscience and Remote Sensing 61, 4101011 (2023).
Zhou, Z.-H. & Feng, J. Deep Forest: towards an alternative to deep neural networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 3553–3559 (2017).
Zhou, Z.-H. & Feng, J. Deep forest. National Science Review 6, 74–86 (2019).
Xu, J., Schüssler, O., Rodriguez, D. G. L., Romahn, F. & Doicu, A. A novel ozone profile shape retrieval using full-physics inverse learning machine (FP-ILM). IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10, 5442–5457 (2017).
Hubert, D. et al. TROPOMI tropospheric ozone column data: geophysical assessment and comparison to ozonesondes, GOME-2B and OMI. Atmospheric Measurement Techniques 14, 7405–7433 (2021).
Seinfeld, J. H. & Pandis, S. N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change, third edn (Wiley, New York, United States, 2016).
Cleveland, W. S., Graedel, T. E., Kleiner, B. & Warner, J. L. Sunday and workday variations in photochemical air pollutants in New Jersey and New York. Science 186, 1037–1038 (1974).
Sicard, P. et al. Ozone weekend effect in cities: Deep insights for urban air pollution control. Environmental Research 191, 110193 (2020).
Logan, J. A. Tropospheric ozone: Seasonal behavior, trends, and anthropogenic influence. Journal of Geophysical Research: Atmospheres 90, 10463–10482 (1985).
Venter, Z. S., Aunan, K., Chowdhury, S. & Lelieveld, J. COVID-19 lockdowns cause global air pollution declines. Proceedings of the National Academy of Sciences 117, 18984–18990 (2020).
Veefkind, J. P., de Haan, J. F., Brinksma, E. J., Kroon, M. & Levelt, P. F. Total ozone from the Ozone Monitoring Instrument (OMI) using the DOAS technique. IEEE Transactions on Geoscience and Remote Sensing 44, 1239–1244 (2006).
Hersbach, H. et al. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society 146, 1999–2049 (2020).
Muñoz-Sabater, J. et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth System Science Data 13, 4349–4383 (2021).
OMI/Aura Ozone (O3) DOAS Total Column L3 1 day 0.25 degree x 0.25 degree V3 https://acdisc.gesdisc.eosdis.nasa.gov/data/Aura_OMI_Level3/OMDOAO3e.003/ (2023).
Sentinel-5P NRTI O3: Near Real-Time Ozone https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_NRTI_L3_O3/ (2023).
Complete ERA5 global atmospheric reanalysis https://doi.org/10.24381/cds.143582cf (2023).
CNEMC Real-time National Air Quality Data https://air.cnemc.cn:18007 (2023).
EEA Air Quality Data https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm (2023).
EPA Air Data: Air Quality Data Collected at Outdoor Monitors Across the US https://www.epa.gov/outdoor-air-quality-data (2023).
Garane, K. et al. TROPOMI/S5P total ozone column data: global ground-based validation and consistency with other satellite missions. Atmospheric Measurement Techniques 12, 5263–5287 (2019).
Loyola, D. G., Xu, J., Heue, K.-P. & Zimmer, W. Applying FP_ILM to the retrieval of geometry-dependent effective lambertian equivalent reflectivity (GE_LER) daily maps from UVN satellite measurements. Atmospheric Measurement Techniques 13, 985–999 (2020).
Marchetti, F. The extension of Rippa’s algorithm beyond LOOCV. Applied Mathematics Letters 120, 107262 (2021).
Bartier, P. M. & Keller, C. P. Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW). Computers & Geosciences 22, 795–799 (1996).
Chen, G. et al. Improving satellite-based estimation of surface ozone across China during 2008–2019 using iterative random forest model and high-resolution grid meteorological data. Sustainable Cities and Society 69, 102807 (2021).
Zhu, S. et al. Satellite remote sensing of daily surface ozone in a mountainous area. IEEE Geoscience and Remote Sensing Letters 19, 1004005 (2022).
Zhu, S. et al. An optimization approach for hourly ozone simulation: A case study in Chongqing, China. IEEE Geoscience and Remote Sensing Letters 18, 1871–1875 (2021).
Li, K. et al. Anthropogenic drivers of 2013–2017 trends in summer surface ozone in China. Proceedings of the National Academy of Sciences 116, 422–427 (2019).
Guerreiro, C. B. B., Foltescu, V. & de Leeuw, F. Air quality status and trends in Europe. Atmospheric Environment 98, 376–384 (2014).
Singh, H., Cai, C., Kaduwela, A., Weinheimer, A. & Wisthaler, A. Interactions of fire emissions and urban pollution over California: Ozone formation and air quality simulations. Atmospheric Environment 56, 45–51 (2012).
Gao, H. O. Day of week effects on diurnal ozone/NOx cycles and transportation emissions in Southern California. Transportation Research Part D: Transport and Environment 12, 292–305 (2007).
UCAR. Ozone in the Troposphere (2022).
Heuss, J. M., Kahlbaum, D. F. & Wolff, G. T. Weekday/weekend ozone differences: what can we learn from them? Journal of the Air & Waste Management Association 53, 772–788 (2003).
Zhao, H. et al. Coordinated control of PM2.5 and O3 is urgently needed in China after implementation of the “Air pollution prevention and control action plan”. Chemosphere 270, 129441 (2021).
Chen, K., Wang, M., Huang, C., Kinney, P. L. & Anastas, P. T. Air pollution reduction and mortality benefit during the COVID-19 outbreak in China. The Lancet Planetary Health 4, E210–E212 (2020).
Zhu, S. & Xu, J. LESO-CN-O3-HOURLY. Zenodo https://doi.org/10.5281/zenodo.7500780 (2023).
Zhu, S. & Xu, J. LESO-EU-O3-HOURLY. Zenodo https://doi.org/10.5281/zenodo.7500782 (2023).
Zhu, S. & Xu, J. LESO-US-O3-HOURLY. Zenodo https://doi.org/10.5281/zenodo.7500784 (2023).
Zhu, S. & Xu, J. LESO-CN&EU&US-O3-DAILY/MONTHLY/YEARLY. Zenodo https://doi.org/10.5281/zenodo.7502204 (2023).
Zhu, S. & Xu, J. LESO-uncertainty. Zenodo https://doi.org/10.5281/zenodo.8183290 (2023).
Wang, T. et al. Ozone pollution in China: A review of concentrations, meteorological influences, chemical precursors, and effects. Science of the Total Environment 575, 1582–1596 (2017).
Lu, X. et al. Severe surface ozone pollution in China: a global perspective. Environmental Science & Technology Letters 5, 487–494 (2018).
Archer, C. L., Brodie, J. F. & Rauscher, S. A. Global warming will aggravate ozone pollution in the US Mid-Atlantic. Journal of Applied Meteorology and Climatology 58, 1267–1278 (2019).
Lin, M. et al. Springtime high surface ozone events over the western United States: Quantifying the role of stratospheric intrusions. Journal of Geophysical Research: Atmospheres 117 (2012).
Sprenger, M. & Wernli, H. A northern hemispheric climatology of cross-tropopause exchange for the ERA15 time period (1979–1993). Journal of Geophysical Research: Atmospheres 108 (2003).
Jin, X. & Holloway, T. Spatial and temporal variability of ozone sensitivity over China observed from the Ozone Monitoring Instrument. Journal of Geophysical Research: Atmospheres 120, 7229–7246 (2015).
Choi, Y., Kim, H., Tong, D. & Lee, P. Summertime weekly cycles of observed and modeled NOx and O3 concentrations as a function of satellite-derived ozone production sensitivity and land use types over the Continental United States. Atmospheric Chemistry and Physics 12, 6291–6307 (2012).
Kerr, G. H., Waugh, D. W., Steenrod, S. D., Strode, S. A. & Strahan, S. E. Surface ozone-meteorology relationships: Spatial variations and the role of the jet stream. Journal of Geophysical Research: Atmospheres 125, e2020JD032735 (2020).
Wang, Y. et al. Contrasting trends of PM2.5 and surface-ozone concentrations in China from 2013 to 2017. National Science Review 7, 1331–1339 (2020).
Kou, W. et al. High downward surface solar radiation conducive to ozone pollution more frequent under global warming. Science Bulletin 68, 388–392 (2023).
Li, T. et al. Short-term effects of multiple ozone metrics on daily mortality in a megacity of China. Environmental Science and Pollution Research 22, 8738–8746 (2015).
Cichowicz, R. & Stelęgowski, A. Average hourly concentrations of air contaminants in selected urban, town, and rural sites. Archives of Environmental Contamination and Toxicology 77, 197–213 (2019).
Saini, R., Singh, P., Awasthi, B. B., Kumar, K. & Taneja, A. Ozone distributions and urban air quality during summer in Agra–a world heritage site. Atmospheric Pollution Research 5, 796–804 (2014).
Council, N. R. Rethinking the ozone problem in urban and regional air pollution (National Academies Press, 1992).
Li, A., Zhou, Q. & Xu, Q. Prospects for ozone pollution control in China: An epidemiological perspective. Environmental Pollution 285, 117670 (2021).
Zheng, B. et al. Trends in China’s anthropogenic emissions since 2010 as the consequence of clean air actions. Atmospheric Chemistry and Physics 18, 14095–14111 (2018).
Dang, R. & Liao, H. Radiative forcing and health impact of aerosols and ozone in China as the consequence of clean air actions over 2012–2017. Geophysical Research Letters 46, 12511–12519 (2019).
Zhao, Y. et al. Substantial changes in nitrogen dioxide and ozone after excluding meteorological impacts during the COVID-19 outbreak in mainland China. Environmental Science & Technology Letters 7, 402–408 (2020).
Wang, H. et al. Seasonality and reduced nitric oxide titration dominated ozone increase during COVID-19 lockdown in eastern China. npj Climate and Atmospheric Science 5, 1–7 (2022).
Zhang, K. et al. Insights into the significant increase in ozone during COVID-19 in a typical urban city of China. Atmospheric Chemistry and Physics 22, 4853–4866 (2022).
Yin, H. et al. Unprecedented decline in summertime surface ozone over eastern China in 2020 comparably attributable to anthropogenic emission reductions and meteorology. Environmental Research Letters 16, 124069 (2021).
Amann, M. et al. Cost-effective emission reductions to improve air quality in Europe in 2020: Analysis of policy options for the EU for the revision of the Gothenburg Protocol. Tech. Rep., International Institute for Applied Systems Analysis, Laxenburg, Austria (2011).
Waxman, H. A. An overview of the clean air act amendments of 1990. Environmental Law 21, 1721–1816 (1991).
Wang, Z.-B., Li, J.-X. & Liang, L.-W. Spatio-temporal evolution of ozone pollution and its influencing factors in the Beijing-Tianjin-Hebei Urban Agglomeration. Environmental Pollution 256, 113419 (2020).
Wang, Y. H. et al. Ozone weekend effects in the Beijing–Tianjin–Hebei metropolitan area, China. Atmospheric Chemistry and Physics 14, 2419–2429 (2014).
Jie, W., Ying, X. & Bing, Z. Projection of pm2.5 and ozone concentration changes over the Jing-Jin-Ji region in China. Atmospheric and Oceanic Science Letters 8, 143–146 (2015).
Querol, X. et al. On the origin of the highest ozone episodes in Spain. Science of the Total Environment 572, 379–389 (2016).
Ikhlasse, H., Benjamin, D., Vincent, C. & Hicham, M. Environmental impacts of pre/during and post-lockdown periods on prominent air pollutants in France. Environment, Development and Sustainability 23, 14140–14161 (2021).
Aas, W. et al. Monitoring of long-range transported air pollutants in Norway. NILU report 13/2021 (Norwegian Institute for Air Research, Kjeller, Norway, 2021).
Liu, Y. & Wang, T. Worsening urban ozone pollution in China from 2013 to 2017–Part 1: The complex and varying roles of meteorology. Atmospheric Chemistry and Physics 20, 6305–6321 (2020).
Liu, R. et al. Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017: A machine learning approach. Environment International 142, 105823 (2020).
Bey, I. et al. Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. Journal of Geophysical Research: Atmospheres 106, 23073–23095 (2001).
Appel, K. W. et al. The Community Multiscale Air Quality (CMAQ) model versions 5.3 and 5.3.1: system updates and evaluation. Geoscientific Model Development 14, 2867–2897 (2021).
Inness, A. et al. The CAMS reanalysis of atmospheric composition. Atmospheric Chemistry and Physics 19, 3515–3556 (2019).
Wang, L. et al. Source apportionment of atmospheric mercury pollution in china using the GEOS-Chem model. Environmental Pollution 190, 166–175 (2014).
Liu, X.-H. et al. Understanding of regional air pollution over China using CMAQ, part I performance evaluation and seasonal variation. Atmospheric Environment 44, 2415–2426 (2010).
Inness, A. et al. Monitoring and assimilation tests with TROPOMI data in the CAMS system: near-real-time total column ozone. Atmospheric Chemistry and Physics 19, 3939–3962 (2019).
Acknowledgements
The authors would thank Paul Palmer for providing suggestions on the data validation and English writing. This work was supported in part by the Open Research Fund of the Key Laboratory of Meteorology and Ecological Environment of Hebei Province under Grant Z202201H, in part by the National Natural Science Foundation of China under Grant 42375142, in part by the Open Fund of Innovation Center for FengYun Meteorological Satellite (FYSIC) and “FengYun Application Pioneering Project” under Grant FY-APP-ZX-2022.0214, and in part by the Chinese Academy of Sciences (CAS) Pioneering Initiative Talents Program under Grant E1RC2WB2.
Author information
Authors and Affiliations
Contributions
J.X. and S.Z. conceived the work, S.Z. conducted the model training and data processing, J.X. conducted the data validation, J.Z., C.Y., Y.W., H.W. and J.S. contributed to data validation and interpretation, S.Z. and J.X. prepared the manuscript with significant contributions from all co-authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhu, S., Xu, J., Zeng, J. et al. LESO: A ten-year ensemble of satellite-derived intercontinental hourly surface ozone concentrations. Sci Data 10, 741 (2023). https://doi.org/10.1038/s41597-023-02656-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02656-4