Introduction

Atmospheric methane (CH4) is the second most potent greenhouse gas (GHGs), trailing only carbon dioxide (CO2) and responsible for more than one-quarter of the global radiative forcing of GHGs since pre-industrial times1,2. Although atmospheric residence time of CH4 is relatively short (8–12 years), changes in CH4 could have profound impacts on future climate and the oxidative capacity of the global atmosphere1,2,3. CH4 mainly originates from natural and anthropogenic sources, including biomass burning, oil/gas industry, livestock, landfills, waste management, wetlands and rice cultivation2,4,5. Surface observations showed that atmospheric CH4 rapidly increased from 1580 ppb in 1980s to 1910 ppb in 20222,6. The specific reason for such increase remains unclear, yet it is most likely associated with increasing anthropogenic activities4,7, which could further offset the climate benefits of carbon emission reductions significantly8.

In 1978, Blake, et al.9 began to measure tropospheric CH4 worldwide and revealed a global increasing trend. Over time, several worldwide measurements of atmospheric CH4 concentrations have been established (e.g., Global Atmosphere Watch (GAW) programme)10,11,12, through ground-based instruments, tower, shipboard, and aircraft sampling13,14. Current monitoring networks are however unfortunately inadequate for sufficient spatial coverage. Although satellites instruments, such as Atmospheric Infrared Sounder (AIRS), Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), Greenhouse Gas Observation Satellite (GOSAT) and TROPOspheric Monitoring Instrument (TROPOMI), offer better data coverage, inconsistency is commonly found between satellite retrievals and in-situ observations due to the differences in observational density, accuracy and precision15,16,17. For example, the sensitivities of thermal emission instruments in the thermal infrared (TIR) are low at lower troposphere as they rely on thermal difference between surface and atmosphere18,19. In addition, small errors of satellite retrieval may also result in large errors in CH4 emission estimation20. Lu, et al.16 combined in-situ observations and satellite retrievals to quantify CH4 emission comprehensively using an analytical inversion method. The surface observation can provide not only strong constraints for the inversion but also critical correlative components, such as methane isotopes and ethane18. In-situ observations with higher accuracy can also facilitate the evaluation of the ability of detecting CH4 anomalies (e.g., large leaks from facilities) by satellites18,21. Therefore, it is of great need to build a comprehensive atmospheric CH4 observing system.

China is the world-largest producer and consumer of coal22,23, now emerging as the world’s largest anthropogenic CH4 emitter by country since the 2000s24. China also has the largest wetlands (rice paddies and natural wetlands) area in Asia, which produce additional amount of CH425,26,27. However, only three surface CH4 observation sites have been established in China under the GAW of the World Meteorological Organization (WMO) since 1990s28. Several short-term CH4 measurement campaigns were also organized29,30, but measurements of CH4 are still largely limited compared to ~1600 air quality monitoring sites across China. Air quality monitoring stations are mainly distributed in urban areas31, while GAW sites focus on regional and continental background level of CH4 concentration. High CH4 from anthropogenic activities remain underrepresented. Both sufficient number and optimal locations32 of CH4 monitoring stations are essential for satellite data evaluation and assimilation, inversion of emission estimation and reconstruction of CH4 concentration21. The optimal distribution of sensors refers to locations of limited sensors that could be used to derive the most accurate spatiotemporal distribution of ground-level CH4 concentrations. However, identifying optimal locations by brute force search or exhaustive search is computationally expensive when the number of sensors and possible locations is large. Previous studies have implemented sparse reconstruction techniques to identify optimal locations of sparse monitoring sites for better reconstruction of spatiotemporal distributions. For instance, different sensor placement algorithms within the proper orthogonal decomposition (POD) framework were applied in the reconstruction of unsteady flow33, ocean surface temperatures34, and surface PM2.532.

Given the essential role of CH4 in carbon cycle and climate change mitigation, the significance of CH4 monitoring has been recognized by China’s government. To guide development of a CH4 observing network in China, we infer spatiotemporal variations of CH4 in China using the Weather Research and Forecasting model (WRF) with an updated GHG module (WRF-GHG)35 and satellite constrained emission estimation36. We further investigate the reconstruction accuracy of four sensor placement algorithms for optimal planning of monitoring sites to better depict surface CH4 concentration in China. The results can provide valuable references for future planning of CH4 monitoring sites.

Results

Model evaluation

Daily surface CH4 from ground-based instruments and column mixing ratios of CH4 from the GOSAT satellite were used to evaluate the performance of WRF‐GHG in simulating CH4. As shown in Fig. 1, high values of CH4 dry column mixing ratios are found in Sichuan Basin (SCB), especially in summer (approximate 1930 ppb) due to emissions from paddy field, livestock and energy activities37,38, and unfavorable dispersion conditions39. Relatively large CH4 column can be seen over eastern China driven by coal mining (northern part)22 and rice paddy fields (southern part)26,38, reaching a maximum in summer (1910 ppb) and a minimum (1870 ppb) in winter. The WRF-GHG can reproduce the distribution pattern of CH4 column, with the difference lower than 10 ppb during autumn and winter (Fig. 1). Although the simulation may overestimate CH4 column in spring and summer, locations of CH4 hot spots, such as eastern and southern China, are generally consistent. The hydroxyl radical (OH) is the main CH4 sink in the troposphere, reaching high value in India and western China40, where simulated CH4 column is 10 to 30 ppb higher than observations (Supplementary Fig. 1). As the CH4 is treated in a passive way in WRF-GHG without chemical reactions, the overestimation of CH4 over western China might be mainly caused by missing chemical loss of CH4. Compared with surface observations, the WRF-GHG model can capture the daily variation of CH4 with a correlation coefficient of 0.67 (Fig. 2). The simulated values are generally higher than observation, especially during summer and autumn, which are the two main periods of CH4 emissions from livestock and vegetation in Qinghai province38. Considering that the Mt. Waliguan (WLG) station is isolated from anthropogenic activities, it provides background CH4 within the Eurasian continent rather than local conditions41. The locations of current sites are not in high CH4 centers either28. The performance of model, as well as satellite retrievals, cannot be evaluated comprehensively with limited monitoring sites in China, requiring more CH4 observation stations to support relevant research activities.

Fig. 1: Model evaluation of CH4 distribution.
figure 1

Seasonal distributions of dry column mixing ratios of CH4 (Unit: ppb) from WRF-GHG (ad) and GOSAT (eh) in 2017.

Fig. 2: Model evaluation of surface CH4.
figure 2

a Daily variations of surface CH4 (unit: ppb) from observation (green) and WRF-GHG (blue) in 2017. b Scatter plot of simulated and observed surface CH4.

Reconstruction accuracy of four sensor placement algorithms

Four sensor placement algorithms were employed to estimate the distribution of surface CH4 across China. The results generated by different algorithms using 10 POD modes (n = 10) are shown in Table 1. Under the condition that number of sensor quantity equals to POD modes (P = n), reconstructed results generated by matrix condition number (MCN) and Extrema exhibit relatively poor performance with seriously high mean percentage error (MPE) and root-mean-square error (RMSE) values. Their reconstruction performances are improved significantly when sensor quantity exceeds the number of modes (P = 1.5n and P = 2n). Compared with other algorithms, Extrema exhibits the poorest ability with correlation of determination (R2) of 0.34 and 0.44 under P = 1.5n and 2n conditions, respectively. The evaluation metrics with DEIM and QR pivot under P = n condition are substantially better than those with MCN and Extrema. Given that the sensor quantity should equal to the dimension of POD basis42, the DEIM algorithm only produces CH4 reconstruction under P = n condition. QR pivot shows the best performance among four algorithms with the largest R2 of 0.86.

Table 1 10-fold cross validation of reconstruction accuracy of four sensor placement algorithms (n and P denote the number of POD modes and sensors, unit of MPE: %, unit of RMSE: ppb).

Figure 3 shows the spatial distributions of surface CH4 concentrations over China from four algorithms and their associated locations of sensors under an oversampled case (P = 2n). The distributions of surface CH4 are closely related with emission sources (Fig. 3a). The intense coal production activities in Shanxi and Guizhou provinces account for 35% and 28% of national coal mine CH4 emissions, respectively (Sheng et al., 2019), resulting in surface CH4 as high as 2700 ppb. All four algorithms can generally capture main spatial features of surface CH4. Although the computational costs of Extrema algorithm are the lowest34, the Extrema algorithm overestimates CH4 over eastern China, with the largest MPE of 5.44%. Owing to proximity or coincidence of the locations of Extrema POD modes43, their sensors are mainly densely located (Fig. 3g). Unlike Extrema, MCN algorithm evenly distributes sensor across China (Fig. 3f) and shows a better performance with R2 of 0.75. The distributions of sensors from QR pivot algorithm are mainly concentrated in high CH4 regions, providing the best CH4 reconstruction with MPE and RMSE as low as 3.46% and 90.14 ppb.

Fig. 3: Reconstructed surface CH4 concentrations and locations of sensors.
figure 3

Spatial distributions of mean surface CH4 concentrations across China from (a) WRF-GHG and reconstructed results with (b) MCN, (c) Extrema and (e) QR pivot algorithms under the condition of 10 POD modes and 20 sensors (P = 2n); (d) DEIM under P = n; associated locations of sensors (fi).

As noted in Luo, et al.31, the number of monitoring sites can influence the assessments of distribution of air pollutants. As shown in Fig. 4, reconstruction performance improves with increasing sensor quantity except for DEIM method. The accuracy of DEIM is limited by the condition that sensor quantity equals to mode quantity. The reconstruction accuracy of Extrema increases rapidly, reaching up to R2 of 0.72 with 300 sensors. The reconstructed results using MCN are more consistent with simulation in low CH4 regions due to evenly distributed sensors (Fig. 3f). Thus, MCN has the smallest MPE among four algorithms when the number of sensors is higher than 20, and MPE reaches 1.62% with 300 sensors. QR pivot in general offers the best reconstruction, which is competitive in both efficiency and accuracy. The RMSE and R2 of QR pivot with 100 sensors are 69.82 ppb and 0.86, approximately equal to those of MCN with 300 sensors. Therefore, the QR pivot algorithm is regarded as a more reliable method.

Fig. 4: Reconstruction errors with different sensor quantity.
figure 4

(a) MPE, (b) RMSE and (c) R2 of MCN, Extrema and QR pivot with increasing numbers of sensors under the condition of P = 2n and DEIM algorithm under the condition of P = n.

Optimal planning of sensor locations

The locations of CH4 sampling sites documented in previous observation studies are summarized in Table 2, which are regarded as places of potential stations in this study. The differences between simulated and reconstructed surface CH4 using these potential stations and QR pivot-guided stations are shown in Fig. 5. In addition to GAW stations that focus on background CH4 level, other field measurements were mainly conducted in megacities with larger influences of anthropogenic activities44,45,46, or in southern China to capture CH4 released from rice fields29,37. Thus, high CH4 concentration over southern China are well reconstructed by using the potential stations located in urban and farmland areas. Performances of reconstruction using locations of potential sites are even slightly better than MCN with MPE of 3.14%, R2 of 0.69 and RMSE of 104.16 ppb (Supplementary Table 1). However, potential stations are likely to miss sources of coal mining, leading to notable discrepancies in central and eastern China (Fig. 5a). MPE and RMSE values of reconstruction based on potential sites are 0.32% and 14.01 ppb larger than those of QR pivot.

Table 2 The locations of sampling sites from previous CH4 observation studies.
Fig. 5: Difference in reconstruction ability using potential sites and QR pivot algorithm.
figure 5

Differences between simulated and reconstructed surface CH4 concentration using (a) potential sites and (b) QR pivot algorithm under the condition of P = 2n (n = 10); associated locations of sensors (c, d).

Although QR pivot algorithm with 20 sensors can reproduce surface CH4 reasonably, the results may be unacceptable with RMSE up to 90 ppb. As shown in Fig. 4, the performances of reconstruction using QR pivot are improved significantly when sensor quantity increases from 20 to 160. To provide implications for optimal planning of CH4 monitoring stations, the reconstruction performances of QR pivot algorithm with 40, 100, 160, 200 and 300 sensors under P = 2n condition are further investigated (Fig. 6). When sensor quantity increases to 100, the reconstruction shows more reliable performance with RMSE of 69.82 ppb but underestimates CH4 concentration by up to 60 ppb over southern China. Such underestimation does not exist in the reconstruction with 160 sensors. QR pivot with 160 sensors has exceptional reconstruction performance with RMSE of 58.56 ppb, and notable bias cannot be seen over western China. With further growth of sensor quantity, slight improvements are found in reconstructions generated by 200 and 300 sensors with RMSE of 55.96 and 48.46 ppb. It indicates that QR pivot with 160 sensors is suitable considering both reconstruction performance and costs of site construction.

Fig. 6: Reconstruction ability of QR pivot with increasing sensor quantity.
figure 6

Differences between simulated and reconstructed surface CH4 concentration using QR pivot with (a) 40, (b) 100, (c) 160, (d) 200 and (e) 300 sensors under the condition of P = 2n, and associated locations of sensors (fj).

Discussion

Increasing atmospheric CH4 concentration is of global concern with respect to climate change mitigation. China emerges as the largest anthropogenic CH4 emission country and accounts for larger than one-quarter of the increase in global anthropogenic CH4 emissions. However, current CH4 monitoring networks are unfortunately inadequate to offer sufficient spatial coverage, limiting the satellite evaluation, inversion of CH4 emission estimation, data assimilation and reconstruction of CH4 concentration. To build a comprehensive atmospheric CH4 observing system, locations of monitoring sites should be considered. In this study, we simulated the spatial distribution of CH4 over East Asia in 2017 using the WRF-GHG model and identified optimal site locations with four sensor placement algorithms. The influences of sensor quantity and locations on reconstruction accuracy of different algorithms were investigated systematically.

Model evaluations using satellite retrieval and surface observation indicated reliable performance of WRF‐GHG in simulating spatial and temporal distributions of CH4 in 2017. High surface CH4 centers were mainly located over Shanxi and Guizhou province driven by coal mine CH4 emissions, and eastern China due to emissions from paddy field and energy activities. Four POD-based sensor placement algorithms could capture main spatial distribution features under an oversampled condition. When sensor quantity equals to POD modes, the reconstructed results from DEIM and QR pivot were substantially better than those from MCN and Extrema. Owing to evenly distributed sensors, the reconstructed result from MCN was more consistent with simulation in low CH4 regions with the smallest MPE when the number of sensors is higher than 20. The QR pivot showed the best performance in selecting optimal monitoring site locations with both high efficiency and accuracy. Using the locations of CH4 sampling sites from previous observation studies as potential stations, reconstruction performance using potential sites had 0.32% and 14.01 ppb larger MPE and RMSE values than those of QR pivot. Notable errors were found over central and eastern China. The reconstruction performance could be significantly improved by increasing the number of sensors until the sensor quantity reached 160. QR pivot with 160 sensors exhibited exceptional reconstruction performance with RMSE of 58.56 ppb and overestimation only over low CH4 concentration regions (western China). Therefore, QR pivot with 160 sensors can provide an optimal planning of CH4 monitoring sites in China considering both reconstruction performance and costs of site construction. Given that sensor placement algorithms are data-driven methods, quality of input data has critical impacts on accuracy of algorithms. The WRF-GHG overestimated the CH4 concentration in western China, which can be partly attributed to no chemical loss of CH4 in the model. Future work to advance modeling and combine modeling and observations to derive better dynamical evolution of CH4 would be helpful. Our results can provide valuable references for future planning of CH4 monitoring sites.

Methods

WRF-GHG model

WRF-Chem Version 3.9.147, enhanced with a GHG module48, was used to simulate CH4 in China. WRF-Chem is a mesoscale coupled meteorology-chemistry model, and WRF-GHG is now a module in WRF-Chem for transport of CO2 and CH4 tracers47,49. CH4 in WRF-GHG, treated in a passive way, was transported online without atmospheric chemical reactions. The CH4 emission inventories were taken from Zhang, et al.36 with a spatial resolution of 0.5° × 0.625°, including emissions from biomass burning, coal, gas, landfills, livestock, oil, rice, geological seeps, termites, wastewater and wetlands.

The simulation domain covered the East Asia region with 115 × 164 grid points at a horizontal resolution of 36 km × 36 km. We used 29 vertical layers up to 50 hPa. The National Center for Environmental Prediction Final Analysis (NCEP FNL) dataset at a 6 hourly temporal interval and 1° × 1° horizontal resolution was used as meteorological initial and boundary conditions. The initial and lateral boundary conditions for CH4 were implemented using GEOS-Chem simulations with 4° × 5° resolution from16.

Observations

Both surface and GOSAT satellite observations were used to validate model performance. Daily surface CH4 concentrations at the WLG station were provided by the World Data Centre for Greenhouse Gases (https://gaw.kishou.go.jp/). The WLG baseline observatory is located in western China (36.29°N, 100.90°E), isolated from industrial and populated regions. Other sites located in China, i.e., Shangdianzi and Lulin stations, were not used in this study due to missing data for 2017 or low temporal resolution. GOSAT, launched in 2009, measures column-averaged dry CH4 mixing ratios with high precision of 0.7% from a polar sun-synchronous orbit at about 13:00 local time50. The University of Leicester version 9.0 Proxy XCH4 retrieval was used in this study, which has a global precision of 9 ppb51. It offers column-averaged dry-air mole fraction of CH4 based on GOSAT Level 1B data51.

Sensor placement algorithms within the POD framework

POD is a commonly used dimensionality reduction technique, which produces low-dimensional dynamical systems that could accurately model spatiotemporal variations of dominant structures of data. The gappy POD is a modification of POD to handle incomplete data, developed by Everson and Sirovich52. Gappy POD can be used in reconstruction of data based on sparse sensor networks33,34. Suppose a snapshot \(\theta (x,t)\) along the domain at time step t, contains K elements. It can be expressed as linear combination of K time-invariant patterns:

$$\theta (x,t) = \alpha _1(t)\phi _1(x) + \alpha _2(t)\phi _2(x) + ...\alpha _K(t)\phi _K(x)$$
(1)

where \(\{ \varphi _i(x)\} _{i = 1}^K\) are space-varying basis functions, which are orthogonal to each other, and \(\{ \alpha _i(t)\} _{i = 1}^K\) are corresponding time-varying coefficients. Equation (1) can be also approximated as a truncated expansion:

$$\theta (x,t) \approx \alpha _1(t)\phi _1(x) + \alpha _2(t)\phi _2(x) + ...\alpha _n(t)\phi _n(x)$$
(2)

where n denotes the number of POD modes, which is fewer than K in Eq. (1). The gappy POD further defines a mask vector m(x,t) to describe where data are missing (m(x,t)=0) or available (m(x,t)=1). Pointwise multiplication is defined as \(\Theta (x,t) = m(x,t) \cdot \theta (x,t)\). Assuming a “repaired” vector \(\theta _R\) from the incomplete θ, it can be represented as follows:

$$\theta _R(x,t) \approx \beta _1(t)\phi _1(x) + \beta _2(t)\phi _2(x) + ...\beta _n(t)\phi _n(x)$$
(3)

where coefficient β is computed by minimizing the difference between \(\theta _R\) and \(\Theta\). It can be differentiated with respect to β(t) and yielded the linear equation system: Mβ = f32,52, where \(M_{ij} = (\phi _i,\phi _j)_n\) and \({{{\mathrm{f}}}} = (\Theta ,\phi _i)_n\).

To identify the optimal sensor locations, several sensor placement algorithms within the POD framework were proposed. Willcox33 developed a minimization of the MCN algorithm. The condition number of M is used to evaluate the reconstruction performance, which becomes larger than 1 for gappy data because the orthogonality of M is lost. The sensors are placed at the grids that minimizes the condition number to reserve orthogonality. Yildirim, et al.34 proposed a method to select extrema of the POD modes (Extrema algorithm), which would maximally capture the variance. The sensors are selected at the location that are the maximum and minimum of each the POD modes. To improve the dimension reduction efficiency, Chaturantabut and Sorensen42 proposed the simplified discrete empirical interpolation method (DEIM) to approximate the nonlinearity by discretely sampling and evaluating the nonlinearity. DEIM recursively learns the interpolation points (sensor locations) according to the maximum linear dependence error. A column permutation matrix D is introduced by the QR with column pivoting. It contains ones and zeros to make the diagonal values of A in a decreasing order: AD = QR, to maximize the absolute value of M. The sensor locations can be obtained from D.

Four sensor placement algorithms mentioned above were applied to identify locations of sensors and reconstruct the simulated surface CH4. Therefore, the data matrix provided by WRF-GHG is composed of 365 snapshots. Each snapshot has 115 × 164 pixels. 10-fold cross validation was used to ensure the reliability of our results. Three evaluation metrics were used to assess the reconstruction ability of POD-based sensor placement algorithms, including MPE, R2, and RMSE, as follows:

$$MPE = \frac{{100\%}}{N}\mathop {\sum}\nolimits_{i = 1}^N {\frac{{\left| {Con_{s,i} - Con_{e,i}} \right|}}{{Con_{s,i}}}}$$
(4)
$$R^2 = \frac{{\mathop {\sum}\nolimits_{i = 1}^N {\left( {Con_{s,i} - \overline {Con_s}} \right)^2\left( {Con_{e,i} - \overline {Con_e}} \right)^2}}}{{\mathop {\sum}\nolimits_{i = 1}^N {\left( {Con_{s,i} - \overline {Con_s}} \right)^2\mathop {\sum}\nolimits_{i = 1}^N {\left( {Con_{e,i} - \overline {Con_e}} \right)^2}}}}$$
(5)
$$RMSE = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 1}^N {\left( {Con_{s,i} - Con_{e,i}} \right)^2}}}{N}}$$
(6)

where Cons and Cone represent simulated and estimated surface CH4 concentrations, respectively. \(\overline {Con_s}\) and \(\overline {Con_{{{\mathrm{e}}}}}\) are the mean value of Cons and Cone. N stands for the total number of grids within China. To better compare the capabilities of different algorithms, we initially used only 10 POD modes, and further explored the influences of the number of POD modes and sensors on reconstruction accuracy, with sensor quantity ranging from 20 to 300.