Parsimonious estimation of hourly surface ozone concentration across China during 2015–2020

Surface ozone is an important air pollutant detrimental to human health and vegetation productivity, particularly in China. However, high resolution surface ozone concentration data is still lacking, largely hindering accurate assessment of associated environmental impacts. Here, we collected hourly ground ozone observations (over 6 million records), remote sensing products, meteorological data, and social-economic information, and applied recurrent neural networks to map hourly surface ozone data (HrSOD) at a 0.1° × 0.1° resolution across China during 2015–2020. The coefficient of determination (R2) values in sample-based, site-based, and by-year cross-validations were 0.72, 0.65 and 0.71, respectively, with the root mean square error (RMSE) values being 11.71 ppb (mean = 30.89 ppb), 12.81 ppb (mean = 30.96 ppb) and 11.14 ppb (mean = 31.26 ppb). Moreover, it exhibits high spatiotemporal consistency with ground-level observations at different time scales (diurnal, seasonal, annual), and at various spatial levels (individual sites and regional scales). Meanwhile, the HrSOD provides critical information for fine-resolution assessment of surface ozone impacts on environmental and human benefits.


Figure S1 .
Figure S1.Mean annual percentage of days with valid OMI data per grid cell from 2015 to 2020.

Figure S2 .
Figure S2.Comparison between original and filled OMI ozone column concentrations on July 1, 2018.

Figure S5 .
Figure S5.Loss functions of the LSTM model for training data and testing data, respectively, at each epoch.

Figure S8 .
Figure S8.Comparisons between model estimated surface ozone concentrations and observations in years 2014 and 2021 across China.

Figure S9 .
Figure S9.Spatial patterns of ozone concentrations from HrSOD and OMI remotely sensed products across China in 2015.

Figure S10 .
Figure S10.Mean monthly surface ozone concentrations and partial correlations between regional surface ozone concentrations and meteorological factors at hour scales in four megacity clusters in the BTH, PRD, SCB, YRD.BTH: Beijing-Tianjin-Hebei region; SCB: Sichuan Basin; PRD: Pearl River Delta; YRD: Yangtze River Delta.

Figure S13 .
Figure S13.Comparison of the LSTM algorithm with ConvLSTM algorithms.The year 2015 was selected with an 80% training and 20% testing data split.

Table S1 .
Summary of the data sources used in this study.

Table S2 .
Detailed results of model tests at eight lookback windows.

Table S3 .
Comparison between different models.The year 2015 was selected with an 80% training and 20% testing data split.
Parameters refers to the total number of trainable parameters in a neural network model, used to measure the model's complexity and capacity.GFLOPs refers to the total number of floating-point operations performed by the model, used to measure the model's computational requirements and complexity (1 GFLOPs=10 9 FLOPs).