A Bayesian framework for deriving sector-based methane emissions from top-down fluxes

Atmospheric methane observations are used to test methane emission inventories as the sum of emissions should correspond to observed methane concentrations. Typically, concentrations are inversely projected to a net flux through an atmospheric chemistry-transport model. Current methods to partition net fluxes to underlying sector-based emissions often scale fluxes based on the relative weight of sectors in a prior inventory. However, this approach imposes correlation between emission sectors which may not exist. Here we present a Bayesian optimal estimation method that projects inverse methane fluxes directly to emission sectors while accounting uncertainty structure and spatial resolution of prior fluxes and emissions. We apply this method to satellite-derived fluxes over the U.S. and at higher resolution over the Permian Basin to demonstrate that we can characterize a sector-based emission budget. This approach provides more robust comparisons between different top-down estimates, critical for assessing the efficacy of policies intended to reduce emissions. Sector-based methane emissions can be backed out from observed methane fluxes, using a Bayesian optimal estimation method. This could help with monitoring gas leaks from industry.

A s the second highest greenhouse gas (GHG) contributor to global radiative forcing, understanding the global budget of methane (CH 4 ) is a top climate priority 1,2 . Methane is emitted from a variety of anthropogenic and natural emission sectors, including oil and gas operations, waste management, coal mining, agriculture, wetlands, and fires among others 3 . Article 14 of the Paris Agreement 4 requires participating countries to report progress towards achieving their climate mitigation goals, or nationally determined contributions. Reporting progress, including any changes in the CH 4 budget, necessitates inventorying all possible emission sources. CH 4 emission inventories can be constructed from "bottom-up" or derived from "topdown" observations. Bottom-up accounting relies on a knowledge of activity data and emission factors for anthropogenic sectors and/or detailed processed-based models that predict CH 4 emissions based on a set of environmental factors for natural emission sectors. By aggregating an ensemble of bottom-up inventories and process-models, Saunois et al. 5 4 emissions in some geographic regions may be caused by imprecise emission factors and activity data that are not readily available at necessary spatial and temporal scales, or by process-based models that perform poorly due to a host of environmental factors (e.g., wetland models rely on wetland inundation maps, biogeochemical process parameterizations, and knowledge of carbon availability 6,7 ).
Atmospheric observations of CH 4 , combined with an atmospheric transport model and a regularizing statistical approach, provide a top-down constraint on the global CH 4 budget. Typically referred to as "inversions" or top-down inventories, these methods estimate CH 4 fluxes by assimilating tower, aircraft, or satellite-based CH 4 measurements [8][9][10][11][12][13][14] . Generally, these top-down methods only estimate total fluxes (i.e., sum of all emission sector contributions) explicitly, and may rely on using prior ratios or relative weights (RWs) between source categories to partition fluxes to specific source sectors. Top-down inverse models may be driven by different regularization or prior conditions, complicating direct comparison between an ensemble of inventories. Therefore, these partitioning approaches are prone to error when comparing with bottom-up inventories if the prior distribution of emissions is biased, or if different sectors have different uncertainties. For example, Saunois et al. compared an ensemble of 22 top-down CH 4 global inversions to an ensemble of bottom-up inventories, and found the bottom-up estimate to be 30% higher than the top-down ensemble mean. The study attributes much of this total discrepancy to large differences in non-wetland natural sources (e.g., lakes and rivers, oceans seeps, termites, geologic sources, wild animals, etc. Bayesian estimation of emissions from fluxes. We propose a comprehensive Bayesian framework that derives top-down gridded CH 4 emissions (ẑ) and their error covariance (Ẑ) from inverse fluxes (x) and their error covariance (Ŝ) without reliance on RWs from an inventory. This framework takes the following form: Where the vectorx represents inverse CH 4 fluxes on a spatial grid with full error characterization given by the covariance matrixŜ, x A is the vector of prior fluxes, I is the identity matrix, S A is the prior error covariance matrix, and M is an aggregation matrix that sums emissions to fluxes (Methods section: Eqs. [8][9]. The posterior emission error covariance matrixẐ is calculated explicitly given M, S A ,Ŝ, and prior emissions error covariance matrix The full derivation is documented in the Methods section and Supporting Information (SI) and has been mechanically verified and tested using simulated emissions, concentrations, and fluxes. This approach has been previously described for atmospheric trace gas retrievals 15 but is here modified and applied for flux comparisons.
Applying Eqs. 1 and 2 allows for the ability to "swap priors." This means that no matter what prior was used in an initial flux inversion, we can swap it with a different prior emissions vector z A , which can include sector-based information. This a critical component for comparison between two different top-down inventories, as it removes error that may arise from choice of prior. Another advantage of this Bayesian approach is that fluxes are partitioned according to not just the prior emission state z A , but also according prior uncertainties on those emissions (i.e., Z A ). For example, if we have evidence that a particular emission sector is well-characterized in bottom-up inventories (i.e., a tight prior uncertainty), Eqs. 1-2 take this knowledge into account when optimizing emissions. In this sense, our approach is similar to updated methods that use prior ratios with prior error variance when computing RWs used for partitioning 16 . However, any RWbased approach still assumes that correlation exists between emission sectors at the grid-level, creating a relationship that may bias results depending on prior construction.
The main caveat of this Bayesian approach is that an explicit representation ofŜ is needed. For analytical flux inversions, this matrix has a closed-form representation that is computed as part of the inversion. For adjoint-based inversions 10,11 , a closed-form representation ofŜ is not directly computed. Though the error covariance can still be estimated 17,18 , this is computationally expensive and is generally not done. Analytical frameworks are best suited for direct comparison between inverse products due to explicit error characterization.

Results
In what follows, we show examples of this approach using a previously performed 2010-2015 GOSAT flux inversion of the Continental United States 9 (CONUS). We also apply Eqs. 1 and 2 to a previously performed 2018-2019 TROPOMI flux inversion over the Permian Basin in western Texas, southern New Mexico 19 . We compare with the partitioned results from the 2010-2015 GOSAT inversion to show how projection to a common prior can be used to assess regional emission trends since uncertainties from different priors and different spatial resolutions of the flux inversions are removed.
Partitioning CH 4 fluxes over CONUS. We apply the partitioning algorithm described by Eqs. 1 and 2 to a 2010-2015 0.50 × 0.625°r esolution North American CH 4 flux inversion 6 performed using Greenhouse Gas Observing Satellite 20 (GOSAT) dry air column mixing ratios of CH 4 . We partition emission to seven distinct sectors at 0.1 × 0.1°resolution: oil, gas, coal, livestock, waste management, wetlands, and other emission sources (soil, fire, etc.). For oil, gas, and coal prior emissions and uncertainties, we use a global inventory for 2016 based on national reports to the United Nations Framework Convention on Climate Change (Scarpelli et al. 21 ). For wetland prior emissions and uncertainties, we take the ensemble average and standard deviation of wetland models that were found to be the highest performing when compared with a global CH 4 flux inversion 22 . For all other emission sectors, we use the 2012 EPA gridded CH 4 inventory as the prior estimate 23 , and assume a one standard deviation uncertainty equal to 50% of the mean value. For illustration and computational tractability, the prior error covariances are represented as diagonal matrices. Figure 1a shows the inverse fluxesx over CONUS, optimized emissionsẑ for the gas and livestock sectors when applying Eqs. 1-2, and the change in emissions compared to the emission prior ðẑ À z A Þ. Other optimized emission sectors are shown in   Fig. 1a is partitioned entirely to the oil sector, but produces emissions lower than the prior, which contrasts with the increasing production reported in the basin 24 . This emission reduction may be due to an overestimate in the prior inventory or a difficulty in GOSAT sampling over that region, factors which could be verified with additional study. Posterior livestock emissions show increases from the prior that are distributed across the central and eastern United States, and a 0.07 TgCH4 a −1 decrease in emissions over central California.
A major advantage of using Eqs. 1-2 to estimate emissions from independent inverse fluxes is that any discrepancies in flux priors can be accounted for when partitioning to a common emission prior. We show this through a sensitivity study whose results are summarized in Fig. 2. Here, we use two inverse flux products: (1) the 2010-2015 GOSAT CONUS flux shown in Fig. 1a, and (2) inverse fluxes from 2010-2015 GOSAT recomputed by using the EDGAR v5.0 emission inventory for oil, gas, waste, and livestock sectors for 2015 25 . Differences in these prior inventories are summarized in Table S1 and Figs. S2-S3. Global inventories like EDGAR use consistent bottom-up aggregation methods across countries, which sometimes leads to discrepancies when compared with national inventories for particular sectors 21 . We employ two methods for optimizing/partitioning inverse fluxes: (1) using optimal estimation (OE) from this study's Eqs. 1-2 to optimize the same emissions prior from Fig. 1, and (2) by computing RWs of each emission sector in the flux prior and partitioning the inverse fluxes to emissions using these weights. In Fig. 2 we show that by employing OE on each separate inverse flux, we get the exact same answer for optimized emissions. This is a result of the ðI À SS À1 A Þ term from Eq. 1. Sometimes called the averaging kernel matrix 26 , this term represents the spatial resolution of the flux estimate, or alternatively, the degree of smoothing of the flux prior to the estimate. Since the averaging kernel is applied to the prior swap term ðx A À Mz A Þ, we account for discrepancies between flux and emission priors as a condition of emission optimization, so the choice of flux prior is immaterial. However, Fig. 2 shows the result when a relative weighting scheme is used for partitioning. The livestock to gas ratio in EDGAR is higher than in the EPA and Scarpelli et al. inventories. The result is that RW-partitioned livestock emissions are higher and gas emissions are lower when using EDGAR RWs (13.2 TgCH 4 a −1 and 5.8 TgCH 4 a −1 , respectively) than when using EPA-Scarpelli RWs (11.3 TgCH 4 a −1 and 8.1 TgCH 4 a −1 , respectively). Similarly, wetland and waste emissions differ depending on which prior are used for RWs. Using prior ratios assumes total correlation between emission sectors in each grid cell as each sector's RW depends on emissions from other sectors. Therefore, we see that different assumptions on the flux prior can complicate comparison even between inverse fluxes that use the same atmospheric observations and transport model.
Comparing inverse fluxes in the Permian Basin. As seen from Figs. 1 and S1, the Permian Basin shows large increases in CH4 emissions for both oil and gas sectors compared to the prior inventory. Production data from the Energy Information Administration 24 (EIA) indicates that between 2010 and 2019, the Permian increased oil production by 190% and gas production by 150%. We expect that fugitive emissions from these sectors would increase proportionately to the production increases, but the actual posterior estimates do not differentiate between intentional (planned) and unintentional (unplanned) emissions. The sensitivity study in Fig. 2 shows that the application of Eqs. 1-2 can be used to compare distinct inverse fluxes when a common emissions prior is used. We compare the 2010-2015 GOSAT flux product with a Permian 0.25°× 0.325°flux product 19 based on May 2018-March 2019 TROPOspheric Monitoring Instrument 27 (TROPOMI). We partition fluxes to the same sectors as Fig. 1, but at a finer 0.1 × 0.1°grid resolution.  Fig. 3), the GOSAT inverse flux estimate is 2.01 ± 0.01 TgCH 4 a −1 and the TROPOMI inverse flux estimate is 2.68 ± 0.5 TgCH 4 a −1 , though these inversions were performed over different time periods and used different flux priors and prior error covariance matrices to constrain their solutions. The GOSAT and TRO-POMI inverse products show distinct spatial patterns. The TROPOMI inversion shows two main regions of elevated CH4 flux, which correspond to the Delaware Basin on the western side of the Permian (west Texas, southeast New Mexico) and the Midland Basin on the eastern side of the Permian. The GOSAT inversion shows a more distributed region of flux enhancement across the Permian. The difference in spatial distribution of these CH 4 flux maps could be due to the more limited GOSAT spatial observing coverage over the Permian and the coarser spatial resolution over which the 2010-2015 GOSAT flux inversion was  (Table S1). Red bars represent fluxes that were derived using EDGAR v5.0 for oil, gas, livestock, and waste sectors. Solid bars represent the optimal emission (OE) approach from Eqs. 1-2. Hatched bars represent partitioning done by computing the relative weights (RW) of emission sectors that made up the flux prior.
performed. However, given that the two flux inversions are asynchronous and that oil and gas production increased dramatically between 2015 and 2018, spatial differences in CH 4 fluxes could also be the result of changing infrastructure throughout the basin. Figure 3d, e show the optimized emissions for the oil, gas, and livestock sectors in the Permian, respectively, and compared to the prior inventory (Fig. 3c). Optimized emissions are derived from GOSAT and TROPOMI fluxes (Fig. 3a, b) that were swapped with a consistent prior (Fig. 3c) using Eqs. 1 and 2. The partitioned GOSAT emissions continue to show more distributed oil and gas CH 4 emissions across the basin when compared to the partitioned TROPOMI emissions, which are mostly concentrated to the Delaware and Midland Basins. Figure 4 shows the aggregated topdown oil and gas emissions in the Permian compared with reported trends from EIA production reports. The mean 2010-2015 oil and gas production in the Permian was 1.3 million bbl per day and 5.0 million Mcf per day, respectively. This increased to 3.8 million bbl per day and 12.7 million Mcf per day between May 2018 and March 2019 mean production, respectively 24 . Though not linear with increased production, comparing partitioned 2010-2015 GOSAT oil and gas emissions and 2018-2019 TROPOMI emissions, we see a 0.52 ± 0.29 TgCH 4 a −1 change in CH 4 emissions, representing a 29% increase. Optimized emissions summed across all sectors increased by 0.42 ± 0.33 TgCH 4 a −1 , with the increase driven mostly by gas, some contribution by oil, and offset by a small decrease in livestock emissions. Originally reported posterior flux estimates showed a 0.67 ± 0.5 TgCH 4 a −1 difference between TROPOMI and GOSAT inversions, larger than the 0.42 ± 0.33 TgCH 4 a −1 difference we quantify here after reprojection to a common prior. This difference discrepancy between top-down estimates corresponds explicitly to differences in flux priors used in the original inversions that is now accounted for with this approach. Therefore, we can quantify how much the choice of flux prior directly impacts any quantified changes between top-down inventories.
Though consistent with the increase in gas production, the quantified 0.42 TgCH 4 a −1 increase in oil and gas emissions from 2010-2015 to 2018-2019 could also be due to observing constraints and/or biases in satellite retrievals. For example, Qu et al. 28 performed global 2 × 2.5°CH 4 inversions for 2019 using GOSAT and TROPOMI using the same prior and atmospheric chemistrytransport model, and derived Permian net fluxes of 2.36 TgCH 4 a −1 and 2.43 TgCH 4 a −1 , respectively, showing consistency between signals observed by these independent remote sensing platforms for this region. Therefore, we conclude that the trends observed in Fig. 3 are likely due to changes in gas operations, and not bias in observing systems. However, in other global regions where the surface is less bright and homogeneous than the Permian, flux results derived from TROPOMI and GOSAT inversion may not agree 28 .

Discussion
Having robust intercomparison methods in place are needed for interpreting the ever increasing number of atmospheric observations of CH 4 , particularly with regard to the expected launches of several CH 4 -observing satellites in the 2020s 29 . As described in this study, inverse fluxes derived from these satellite observations will depend on the observations themselves, the chemical transport model, and the prior constraint. Having the ability to directly quantify how these terms influence emissions is needed for diagnosing uncertainty in the estimated methane budget. Ultimately, this information can be combined to provide better global understanding of methane emissions and finer and more policyrelevant spatial and temporal scales. Likewise, the Bayesian framework we describe in this study can be applied to other atmospheric gas fluxes like carbon dioxide (CO 2 ). Partitioning between biospheric and anthropogenic CO 2 emissions remains highly uncertain 30 , so incorporating this framework that directly optimizes emission sectors could be useful for reconciling the budget. This approach does require a move from traditional ensemble or adjoint-based inversions, created to reduce cost of this computationally expensive problem, to an analytic or optimal estimation inversion as an explicit representation of the posterior covariance is required and this covariance is not easily calculated from ensemble or 4D-var methods.
While our estimates account for the spatial resolution and error associated with the inversion of observations to fluxes, we do not explicitly account for error in model transport and chemistry. These errors could be important when comparing emissions between seasons or in regions where transport is poorly modeled such as in the tropics. For example, a study of global carbon monoxide emissions, a trace gas that (like CH 4 ) is affected by reaction with the hydroxl radical (OH) and transport, found that convective mass flux in the tropics is likely responsible for errors in emissions 31 . While our approach can account for this error if a corresponding posterior covariance is provided, we emphasize that studies that both characterize and mitigate this part of the flux error budget are needed to better use satellite observations of methane and of other trace gases. Ultimately, improved emission and error characterization from top-down information will allow for better updates and comparisons with bottom-up inventories, which can guide progress towards CH 4 mitigation.

Methods
In this section we derive a method to estimate CH 4 emissions from atmospheric observations. The SI contains and alternate derivation (Section S1) and conceptual examples to further clarify the mechanics of prior-swapping. Emissions can be represented as a vector i.e., ðz 2 R m Þ that contains both sectoral and spatial information. Atmospheric observations i.e., ðy 2 R p Þ often represent concentrations of CH 4 observed by surface, satellite, airborne, or some other observing system. Generally, atmospheric inversions do not directly optimize CH 4 sectoral emissions from observations, and instead optimize CH 4 fluxes i.e., ðx 2 R n Þ, which represent the summation of CH 4 emissions within a grid cell. In the flux inversion setup, atmospheric CH 4 observations y are used to estimate CH 4 fluxes x. We estimate this optimal state by finding the mode of the posterior flux distribution p (x|y), orx. A transformation or Jacobian matrix K can be derived from atmospheric transport simulations (e.g., GEOS-Chem), such that we can represent the relationship between fluxes and observations: Where n represents noise. We apply Bayes Theorem to estimate the posterior distribution p (x|y): pðxjyÞ / pðyjxÞpðxÞ ð 4Þ Where p (y|x) is the maximum likelihood given by Eq. 3, and pðxÞ is the prior distribution. If we assume that p (y|x) and p(x) are Gaussian distributions, and y and x A represent the modes of those respective distributions, then the mode of the posterior distributionx has a closed form solution, known as the Maximum A Posteriori (MAP; Rodgers, 2000) solution: x ¼ x A þŜK T S À1 y ðy À Kx A Þ ð5Þ For policy-relevance and CH 4 budget quantification, we really wish to optimize emissions using atmospheric observations, i.e., we want to compute the explicit posterior representation pðz j yÞ without re-simulation of an atmospheric transport model. The relationship between z and x is simple aggregation, and can represented by matrix M: If z and x share the same grid resolution (i.e., if z 2 R m and x 2 R n and s is the number of emission sectors, then m = ns), the matrix M 2 R n m is represented with the following terms: If z and x pertain to different grids, the relationship M is defined by the geographic area (Ω) and intersections (\) of grid cells: Since M is simply a summation matrix, we assume there is no noise associated with its application. Using M, we can update Eqs. 5 and 6 to find the optimal emission state vectorẑ and its posterior error covarianceẐ: Equations 10 and 11 provide an explicit closed form solution forẑ, which is sufficient for emission optimization without reliance on RWs. Therefore, application of Eqs. 10 and 11 into existing inverse frameworks would provide posterior emission estimates constrained by atmospheric observations. However, these equations require the computation of the matrix (KM), which can be large in cases where many atmospheric observations exist. An exactly equivalent solution is possible with just the products of the flux inversion, specificallyŜ,x, S A , and x A . We show one derivation below and provide an alternative derivation in the SI: can be shown to have an equivalent form that is often used in atmospheric trace gas retrievals 26 : Where A is the averaging kernel matrix A ¼ ∂x ∂x : And G is the Gain Matrix G ¼ ∂x ∂y : And x are the true atmospheric fluxes. Therefore, Eq. 12 shows that the optimal solutionx is a combination of the truth, smoothed by some prior and includes noise. From Eq. 12, we can create a flux to posterior flux operation H, given that our relationship M is known: HðMzÞ The operation H allows for emissions to be smoothed with an averaging kernel, allowing for direct comparison withx. With this operator relationship, we treatx as an observable. The error covarianceŜ includes both smoothing (S s ) and measurement error (S m ):Ŝ For flux partitioning, we want to isolate the S m error componentx, as the H operator already accounts for smoothing via the averaging kernel. The matrix S m can be represented 32 using G and S y : While S s has the following representation: We can combine Eqs. 17 and 19 to get an alternate form of S m that does not require S y and G explicitly: Using S m for observational error covariance, we can apply the MAP solution to deriveẑ andẐ:ẑ Now we have a direct solution forẑ andẐ derived only from the products of a flux inversion (specifically, x A ; S A ,x,Ŝ), an emissions prior (Z A ), and an aggregation matrix M. Equations 22 and 23 can be shown to be of the same form as Eqs. 1 and 2 by showing that A T S À1 m ¼Ŝ À1 . Decomposing A T and recognizing that S and S A are symmetric matrices, we have We first expand Eq. 19 for an alternative expression of S m : S m ¼Ŝ À ðI À AÞS A ðI À AÞ T ð25Þ ¼Ŝ À ðI À I þŜS À1 Taking the inverse of S m using the Eq. 27, we have: The algebra for 28 and 29 are only possible if ðA T Þ À1 exists. For overdetermined systems (i.e., dimension of y » dimension of x), this is generally valid. Multiplying both sides of Eq. 29 by A T finishes the proof: In a similar fashion, we can show that Eqs. 2 and 23 are equivalent if A T S À1 m A ¼Ŝ À1 À S À1 A . To do this, we multiply Eq. 30 by A: