Introduction

Following the Montreal Protocol on Substances that Deplete the Ozone Layer1, the use of methyl bromide (CH3Br) as a fumigant was largely phased out by 2005, with some exemptions for critical and quarantine/pre-shipment (QPS) uses. As a result, the atmospheric abundance of CH3Br has declined, and the stratospheric ozone layer is showing signs of recovery in recent years due to the phase-out of CH3Br and other ozone-depleting substances (ODSs)2,3,4. However, an unintended consequence of the CH3Br phase-out is that global use of sulfuryl fluoride (SO2F2), a non-ODS fumigant alternative to CH3Br, has dramatically increased5,6. Like CH3Br, SO2F2 is a broad spectrum pesticide used to exterminate structural, commodity, and stored product pests such as drywood termites, subterranean termites, cockroaches, wood-boring beetles, moths, bed bugs, and rodents7,8,9. While SO2F2 is not an ODS, it is a potent greenhouse gas (GHG) that is accumulating in the global atmosphere, with an ambient air mole fraction that has risen from ~0.3 parts per trillion (ppt) in 1978 to nearly 3.0 ppt in 20235,10.

Recent measurements and global modeling from the Advanced Global Atmospheric Gases Experiment (AGAGE) indicate that global emissions of SO2F2 have reached a historic high of nearly 3.0 Gg yr−1, and global mean atmospheric mole fractions of SO2F2 have continued to increase at a rate of nearly 0.1 ppt yr−1 (4% yr−1) from 2015 to 20234,5,11. At present, the effective radiative forcing of SO2F2 is small at ~0.5 mW m−2, but increasing as global mole fractions of SO2F2 continue to rise12. Rising emissions of SO2F2 are a concern because SO2F2 exhibits strong infrared absorption properties, has a relatively long atmospheric lifetime, and therefore has a high global warming potential (GWP)10,12,13,14,15,16.

When SO2F2 was first approved for use as a fumigant/pesticide by regulators in 1959, its atmospheric lifetime was thought to be negligible, and thus the study of its environmental fate was largely neglected17,18. However, seminal studies published in 2008-2009 on the atmospheric chemistry of SO2F2 demonstrated that removal of SO2F2 from the atmosphere is predominantly mediated by ocean uptake and hydrolysis, resulting in a relatively long effective atmospheric lifetime of 36 ( ± 11) years10,13,14,15. Consequentially, the GWP of SO2F2 was revised to be much larger than initially thought, recently estimated at 7510 for a 20-year time horizon, or 4630 for a 100-year time horizon4,12. (See Supplementary Note 1 for additional background on SO2F2 in the atmosphere). The Intergovernmental Panel on Climate Change (IPCC) characterized SO2F2 as a well-mixed GHG with a high GWP in its 2013 Fifth Assessment Report16. Yet, SO2F2 was not included in the 2015 Paris Climate Agreement, nor subsequent international legislation on climate change and GHG emissions under the United Nations Framework Convention on Climate Change (UNFCCC)19,20.

Despite recent advances in scientific knowledge of atmospheric SO2F2, no top-down, atmospheric measurement-based estimate of SO2F2 emissions exists for the U.S. There is also no national inventory of SO2F2 use, and SO2F2 is not included in the EPA Greenhouse Gas Reporting Program (GHGRP) or National Greenhouse Gas Inventory (GHGI)21,22. Furthermore, California is the only state that keeps a public record of statewide SO2F2 use23. The lack of inventory data on SO2F2 use complicates attempts to constrain U.S.-wide emissions of SO2F224. To address this challenge, we utilize inverse modeling to infer SO2F2 emissions directly from high-precision measurements of atmospheric SO2F2 collected across North America throughout 2015–2019. We analyze these measurements using a geostatistical inverse model (GIM) to estimate surface emissions of SO2F2 over the continental U.S., and we compare our top-down inverse model estimates with available state records of SO2F2 use23 and recent global emissions rate estimates from AGAGE5. Overall, we find that the largest SO2F2 emissions are from California, with zero to low emissions across most of the rest of the U.S.

Results and discussion

Atmospheric measurements of SO2F2 reveal an unusual pattern

We use measurements of SO2F2 from the NOAA Global Monitoring Laboratory (NOAA/GML) Global Greenhouse Gas Reference Network (GGGRN) collected throughout 2015–2019 at sites shown in Fig. 1 as the principal data constraint in our analysis11. Most GGGRN measurements (>90%) show SO2F2 mole fractions <5 ppt and are clustered around the global background mole fraction of 2.0–2.5 ppt (Fig. 2a). However, numerous enhancements (10–450 ppt) are also observed throughout the study period (Fig. 2b), 98% of them at sites in California. The vast majority of enhancements (85%) are observed at either the Los Angeles Megacity Carbon Project (LAC) observation sites25,26 or the Mt. Wilson Observatory (MWO) in Los Angeles, California27. Notably, measurements at the LAC sites are only available between June 2015 and October 2017 (Ndays = 852), but measurements at MWO are ongoing11. Numerous SO2F2 enhancements are also observed at two other sites in central California (Sutro tower (STR) and Walnut Grove tower (WGC), N = 117), while none are observed during aircraft flights at Trinidad Head (THD). For additional details on how these SO2F2 enhancements are defined relative to the global background SO2F2 mole fraction, see Supplementary Note 3 (Supplementary Figs. 4–7).

Fig. 1: Map of the mean footprint (sensitivity) of NOAA GGGRN observations to SO2F2 emissions.
figure 1

The mean footprint in each grid box is colored by quintile in shades of blue (2015-2019, Nobs = 15,385). Quintiles are defined relative to the maximum of the daily mean footprint. Red markers show GGGRN tower/surface sites, yellow markers represent locations of regular NOAA/GML vertical-profiling aircraft sites, and small orange markers indicate locations of individual samples collected during intensive aircraft campaigns.

Fig. 2: Frequency of observed SO2F2 mole fractions in NOAA GGGRN measurements, colored by site.
figure 2

Panel (a) shows a frequency histogram of GGGRN observations <5.0 ppt (bin size = 0.25 ppt). Panel (b) shows a frequency histogram of GGGRN observations >10.0 ppt (bin size = 10.0 ppt). The vertical dotted and dashed lines indicate the mean and median observed values, respectively.

Outside of California, SO2F2 enhancements occur very infrequently (N = 16, <1 yr−1) at only a few sites. In addition, no monitoring site outside of California records the frequent (sub-weekly) and large (>10 × ) SO2F2 enhancements that are characteristic of the LAC and MWO sites in Los Angeles. However, it is important to note that SO2F2 enhancements do not translate linearly to SO2F2 emissions, since other factors such as wind direction, wind speed, atmospheric boundary layer height, and the distance between the emissions and the observation site can influence the measured enhancements. While Fig. 2 indicates that the likelihood of large emissions from California is higher than elsewhere, an inverse model is needed to infer the emissions fluxes (and corresponding uncertainties) from the measured enhancements.

We correlate these measured atmospheric mole fractions of SO2F2 with estimated surface emissions using an atmospheric transport model. We specifically calculate daily, gridded footprints, which quantitatively describe the potential of upwind locations to influence SO2F2 mole fractions at downwind observation sites. These footprints have units of atmospheric SO2F2 mole fraction (ppt) per unit emission, and they are a measure of the effective geographic coverage of the observation network. A map of the mean footprint across all GGGRN measurements is shown in Fig. 1 to illustrate the broad spatial coverage of emissions monitoring provided by the GGGRN. The footprint map indicates that if SO2F2 emissions occurred within most of the highly-populated regions of the U.S., SO2F2 enhancements would be detected by at least one of the GGGRN sites nearby.

Largest U.S. SO2F2 emissions are from California

We use a geostatistical inverse model (GIM) to infer surface emissions of SO2F2 from the GGGRN atmospheric measurements28,29,30. The GIM leverages atmospheric observations, coupled with an atmospheric transport model, to interpolate grid-scale SO2F2 emissions across the continental U.S. GIMs have been used widely in the atmospheric science community to estimate trace gas emissions given a set of atmospheric mole fraction measurements e.g.,31,32,33,34,35,36. Importantly, the GIM also provides a statistical methodology for calculating uncertainties in the estimated emissions28,30.

Our inverse modeling results exhibit a dichotomy between the state of California and the rest of the U.S., shown in Fig. 3. Emissions from California are both large in magnitude and persistent throughout the study period, especially from coastal southern California (Los Angeles, Orange, and San Diego counties). Specifically, we estimate the annual mean SO2F2 emissions rate from California at 0.26 ( ± 0.10) Gg yr−1. Uncertainties are reported as 2-σ errors (equivalent to 95% confidence intervals), with errors calculated from the posterior covariance matrix from the GIM29. Posterior uncertainties from the GIM include the aggregate effects of measurement, model, atmospheric transport, representation, and spatial/temporal aggregation errors. For more details on the GIM, see the Methods section and Supplementary Note 4 (and references therein).

Fig. 3: GIM estimate of SO2F2 emissions across the continental U.S. The largest emissions, shown in red, are from California.
figure 3

Across most of the rest of the U.S., emissions are low to zero and sparse, shown in white.

We estimate the annual mean emissions rate for the entire continental U.S. at 0.30 ( ± 0.18) Gg yr−1, with the bulk of those emissions (60–85%) coming from California. Outside of California, emissions are small and sparse (Fig. 3), with an annual mean emissions rate of 0.05 (±0.16) Gg yr−1 for all other continental U.S. states combined. For many states with a colder climate, this result may be expected, as the threat from wood-destroying termites and thus the need for structural fumigation with SO2F2 is lower outside of the warm coastal regions. However, this result of low emissions for other warm coastal regions, such as the Gulf Coast and the Atlantic Coast, may appear surprising.

While the magnitude of emissions from California is substantially larger than the rest of the U.S., the corresponding uncertainties are smaller for California than for the rest of the U.S., largely due to two main factors. First, emissions totals are summed over a much larger area outside of California, resulting in a larger uncertainty. Second, there is a higher spatial density of observations in California than for most other parts of the U.S., resulting in smaller uncertainties for California and larger uncertainties for regions that are not as densely sampled spatially. The sparsity of the observation network, combined with the ephemeral nature of SO2F2 fumigation events, makes SO2F2 emissions especially uncertain outside of California. For additional information on the inverse model best estimate of U.S. SO2F2 emissions, see Supplementary Note 5 (Supplementary Figs. 11–14).

An important caveat of our results is that the GGGRN is not highly sensitive to emissions from Florida (Fig. 1), a state that is at high-risk for termite infestations37, potentially resulting in SO2F2 use and emissions38,39. In Supplementary Note 7 (Supplementary Figs. 20–22), we conduct a sensitivity test and find that if daily emissions from Florida were comparable in magnitude to SO2F2 emissions from California, those emissions would be regularly detected at the closest tower site in South Carolina (SCT). However, the actual GGGRN measurements at SCT, as well as the ECO flights, show no evidence of SO2F2 emissions in all but one sample. While our present work does not rule out sparse and/or infrequent SO2F2 emissions from Florida, we find that the annual emissions of SO2F2 from California, relative to the southeast U.S., are substantially larger. At the very least, our study underscores the potential utility of adding a long-term GGGRN site in Florida, which would help constrain emissions of GHGs and other natural and anthropogenic trace gases in this otherwise data sparse region of the U.S.

Majority of emissions likely from structural fumigation

State records from the California Department of Pesticide Regulation (CDPR) indicate that ~85% of SO2F2 use (by mass) in California is for structural fumigation, while ~15% is for agricultural and commodity fumigation23. We see a similar end use disparity in our inversion results (Fig. 4).

Fig. 4: Emissions maps showing the relative contribution of the predictor variables and the stochastic variable to the GIM best estimate of emissions in California.
figure 4

Panel (a) shows the GIM best estimate of SO2F2 emissions. Panel (b) shows the mean contribution of the stochastic variable. Panel (c) shows the contribution of the structural fumigation predictor variable only, and Panel (d) shows the contribution of the agricultural/commodity fumigation predictor variable only. The emissions pattern in Panel (a) is a linear combination of the emissions patterns in Panels (b), (c), and (d).

To help estimate the contribution of these different SO2F2 sources, the GIM allows the inclusion of spatially explicit predictor variables30,32. We find that two variables, both defined from CDPR data, are optimal predictor variables: (1) county-level SO2F2 use for structural fumigation in California (Fig. 4c) and (2) county-level SO2F2 use for agricultural and commodity fumigation in California (Fig. 4d)23. The CDPR data on SO2F2 usage is described in further detail in Supplementary Note 2 (Supplementary Figs. 1-3).

The GIM also includes a stochastic component (Fig. 4b), which is an estimate of spatial and temporal emissions patterns that are not already described by the predictor variables. By construction, the stochastic variable follows a multivariate normal distribution with a mean of zero30. The overall modeled emissions (Fig. 4a) are a linear combination of the predictor variables and the stochastic variable. The stochastic variable and the relative weights on the predictor variables are optimized as parameters in the inverse model to best match the atmospheric observations and are not known a priori.

Figure 4 shows the relative contribution of the predictor variables and the stochastic term to the total emissions in Fig. 3. Consistent with the CDPR inventory, the largest contribution to the emissions totals arises from the predictor variable defined by the CDPR dataset of county-level SO2F2 use for structural fumigation. Together, our inversion results and the CDPR inventory imply that structural fumigation (specifically in California) is the predominant source of SO2F2 emissions across the U.S.

Note that we examine several other candidate predictor variables, including those with information outside of California, but none of these variables (or any linear combination of them) improve the model-data errors relative to the inversion setup with the optimal variables. For additional details on model selection of predictor variables, see Supplementary Note 4 (Supplementary Figs. 9, 10)40. In Supplementary Note 6 (Supplementary Table 3, Supplementary Figs. 15–19), we show additional inverse modeling results using alternative predictor variables to demonstrate that our results are robust across different choices of predictors in the GIM, such as land cover classifications from the National Land Cover Database (NLCD) (agreement within 7% for both U.S. and California emissions)41. Furthermore, we find that our results are consistent in magnitude and spatial distribution (agreement within 5% for U.S. emissions and within 20% for California emissions) even when we use a spatially uniform prior in the GIM, adding confidence to the spatial pattern in our results.

Reconciling top-down emissions with California state data

Our inverse model results indicate that the largest U.S. SO2F2 emissions occur in southern coastal California. This result is consistent with California state records of county-level SO2F2 use from the CDPR, which indicate that Los Angeles, Orange, and San Diego counties alone account for >50% of statewide SO2F2 use, with Los Angeles County being the largest use county23.

According to CDPR records23, SO2F2 use in California has been steadily increasing from 0.976 Gg in 2007 to 1.367 Gg in 2018, with a mean statewide use of 1.435 Gg SO2F2 yr−1 from 2015-2017. However, our inverse modeling results indicate an annual mean SO2F2 emissions rate of only 0.16–0.36 Gg yr−1, equivalent to only 11–25% of the total reported SO2F2 use in 2015–2017 (assuming accurate SO2F2 usage data reporting to CDPR). This emitted fraction is notably smaller than the previous estimate of ~2/3, which was derived from a comparison between global box model emissions estimates using AGAGE data and an estimate of global industrial SO2F2 production10.

The emitted fraction is important for two main reasons. First, for structural fumigation, the fraction of SO2F2 that is not absorbed or destroyed on surfaces during fumigation likely escapes to the atmosphere, contributing to the greenhouse effect10. Second, for agricultural and commodity fumigation, the absorbed fraction of SO2F2 could have implications on the amount of fluoride, sulfate, and other residues that humans and animals are exposed to in their diets42,43.

Comparison with global emissions estimates from AGAGE

Recent work by Gressent et al.5 provides an atmospheric measurement-based estimate of global mean SO2F2 emissions: ~2.89 Gg yr−1 over 2015–20175. This global emissions rate is likely accurate, as it is based on the increase of global baseline atmospheric measurements of SO2F2 assimilated into the AGAGE 12-box model10,44. Neglecting any issues with comparability between AGAGE and NOAA measurements (e.g., calibration standards), our estimate of 0.16–0.36 Gg yr−1 for California accounts for 5.5–12% of global SO2F2 emissions.

However, in contrast to our study, regional atmospheric measurements were not used in Gressent et al.5. This resulted in an estimate of large down-scaled SO2F2 emissions throughout the central and eastern U.S., a scenario not observed in the regional GGGRN measurements or our inverse model estimates. Gressent et al.5 used a proxy-based estimate to define the spatial pattern of SO2F2 emissions, constructed as the sum of (1) a structural fumigation component, which scales with population density (multiplied by a binary mask indicating whether or not a region has termites), and (2) a post-harvest treatment component, which scales with cropland fraction. In our work, however, we find that neither crop cover fraction nor population density are skilled predictor variables of regional atmospheric measurements outside of California. The use of these variables by Gressent et al.5 to define the spatial pattern of U.S. SO2F2 emissions likely overestimates emissions for most agricultural regions and densely populated areas outside of California. It is also likely that these variables do not predict SO2F2 use in colder regions of the U.S. with lower termite risk.

California faces challenges with drywood termites

While subterranean termites are pervasive across most of the U.S., California is home to a challenging termite species that helps explain the abundance of SO2F2 fumigation in the state: the western drywood termite, Incisitermes minor (Hagen) (I. minor)8,9. Drywood termites are particularly difficult to treat because they establish colonies without having contact with the soil9. Effective fumigant-free methods that utilize bait are available for treating subterranean termites45,46, but these methods are ineffective against drywood termites, which do not forage for their food. Furthermore, I. minor (and other drywood termites, such as the West Indian drywood termite Cryptotermes brevis (Walker)) form aerial colonies in high, inaccessible parts of wooden structures, leading to situations where fumigation is the most effective eradication method9.

Notably, SO2F2 fumigation does not provide residual protection against future infestations, prompting reoccurring treatments of structures situated in regions with pervasive termites9. Alternatively, several non-fumigation treatment methods exist, including localized treatments such as wood replacement, wood injection with pesticides, hot (>50C) and cold (liquid nitrogen) temperature treatments, electrocution, and microwave radiation8,9. However, these alternative methods generally do not provide whole-structure eradication of termites. The pervasive threat of termite infestations in warm-climate regions highlights the need for the development and practice of sustainable, entomology-guided techniques for controlling urban pest populations without the release of harmful atmospheric pollutants or climate-warming gases8,9,45.

Significance and policy relevance

California’s SO2F2 emissions provide a case study on how greenhouse gas emissions that are unaccounted for in emissions inventories can potentially offset progress made towards emissions reductions. In 2006, California passed AB-32, the Global Warming Solutions Act47, which charged the California Air Resources Board (CARB) with monitoring and regulating statewide emissions sources of GHGs, and set a target of reducing statewide GHG emissions to 1990 levels by 2020. However, since the long atmospheric lifetime of SO2F2 was not discovered until 2009, SO2F2 was not included in AB-32. California renewed its commitment to emissions reductions in 2016 by passing SB-32, which expanded upon AB-32 and set a statewide GHG emissions reduction target of 40% below 1990 levels by 203048. Although CARB classified SO2F2 as a short-lived climate pollutant (SLCP) in 2016, it has yet to add the gas to the state’s annual GHG emissions inventory or its latest Climate Change Scoping Plan49,50.

From 2007-2019, California reports an average of 4.8 Tg CO2 equivalents (CO2e) yr−1 in statewide GHG emissions reductions under AB-3249. (1 Tg = 1 million metric tons (MMT)). Notably, these emissions reductions slowed and plateaued to an average of 2.25 Tg CO2e yr−1 from 2010-2015. Our inverse model results imply an annual mean SO2F2 emissions rate of 0.7–1.7 Tg CO2e yr−1 (100-yr GWP) or 1.2–2.7 Tg CO2e yr−1 (20-yr GWP) for 2015-201723. Thus, the short-term warming effect of California’s annual SO2F2 emissions, which are unaccounted for under California’s current GHG accounting protocol, are smaller yet comparable in scale to the annual statewide CO2e reductions achieved under AB-32. In other words, California could accelerate its future GHG emissions reductions by including SO2F2 in GHG emissions inventories and phasing out the use of SO2F2 in the state, and/or mandating abatement of SO2F2 emissions.

SO2F2 is also regulated in the U.S. by the Environmental Protection Agency (EPA) as a restricted-use pesticide (RUP) due to its inhalation toxicity17,18,43,51,52,53,54, and is currently undergoing re-registration review by the EPA55. SO2F2 is the only fumigant approved by the EPA for structural fumigation, and it is one of several options available for agricultural and commodity fumigation.

Apart from the climate-warming effect, there are public health and safety concerns surrounding the use of SO2F2 for fumigation. Most notably, there have been several documented cases of inadvertent human deaths caused by acute exposure to SO2F2, and numerous human health and safety concerns have been reported42,43,51,52,53,54,55,56,57. The EPA has set a human exposure limit of 1 part per million (ppm) SO2F2, which is the clearance level for reoccupation of a fumigated structure55. As indoor SO2F2 levels during fumigation typically exceed 1,000 ppm, ambient air levels around fumigation sites could potentially exceed the 1 ppm exposure limit. Aeration of fumigated structures has been demonstrated to occur rapidly, with over 90% of indoor SO2F2 lost to the atmosphere within the first 2 hours of ventilation58,59. Recent aerial thermal-IR spectral imaging surveys over the Los Angeles Basin have identified SO2F2 emissions plumes emanating from fumigation sites in residential neighborhoods60,61. Under the Clean Air Act62, the EPA is required to regulate emissions of hazardous air pollutants (HAPs), but SO2F2 has not been included in the list of HAPs to date.

Potential solutions

There are potentially steps that fumigators and fumigation regulators can investigate to minimize the amount of SO2F2 that escapes to the atmosphere during structural fumigation procedures. For example, strong basic solutions and bio-based organic solvents have been shown to effectively absorb and destroy SO2F263,64,65,66. These SO2F2 chemical absorption methods could be utilized at the exhaust vents of fumigated structures to capture SO2F2 and prevent its escape to the atmosphere. Implementing an SO2F2 capture step in the post-fumigation venting process could result in GHG emissions reductions on the order of 1-3 Tg CO2e, while also minimizing the risk of SO2F2 exposure to nearby residents and fumigation workers.

A fumigant re-capture step may increase the upfront cost of fumigation, but doing so would also provide the global climate and environmental benefits of reducing SO2F2 emissions. However, further work is required to reduce the cost and technical barriers to these solutions. Nevertheless, the process of emitting SO2F2 directly to the atmosphere after fumigation, as is standard practice under the California Aeration Plan59, leads to a previously under-emphasized climate warming effect, and could potentially be reconsidered as an option for achieving future GHG emissions reductions.

Methods

Atmospheric measurements of SO2F2

We use atmospheric measurements of SO2F2 from the NOAA Global Monitoring Laboratory (NOAA/GML) Global Greenhouse Gas Reference Network (GGGRN)11. NOAA/GML began measuring SO2F2 in air samples collected at sites shown in Fig. 1 in 2015. Air samples were collected via programmable flask packages (PFPs) at surface and tower sites, from vertical-profiling NOAA/GML aircraft flights, and during the Atmospheric Carbon and Transport - America (ACT) and the East Coast Outflow (ECO) aircraft campaigns67,68,69,70,71. High-precision measurements of the SO2F2 mole fraction in each PFP collected from these measurement platforms were made by NOAA/GML on the PERSEUS-1 gas chromatography/mass spectrometry (GC-MS) instrument. These measurements are quite accurate, with a mean measurement error of <0.05 ppt (within 2% of the ~ 2.5 ppt background). In total, we include N = 15, 385 measurements of SO2F2 in our analysis (10,157 from towers and surface observatories, 5,228 from aircraft). For additional details on the NOAA/GML measurements of SO2F2, see Supplementary Note 3 (Supplementary Figs. 4-8 and Supplementary Tables 1 and 2).

Atmospheric transport model (STILT)

We use an atmospheric transport model to correlate measured atmospheric mole fractions of SO2F2 (z) with daily grid-scale surface emissions (s). Specifically, we employ the Stochastic Time-Inverted Lagrangian Transport (STILT) model with atmospheric wind fields defined by the North American Mesoscale Forecast System 12-km resolution meteorology data (NAM-12)72,73,74,75,76. We use NAM-STILT to quantitatively model SO2F2 mole fractions in the atmosphere given a map of estimated SO2F2 emissions. Using this transport model and the GIM, we optimize the emissions maps such that our modeled SO2F2 mole fractions match observations.

We use STILT to simulate the transport of an ensemble of theoretical particles released at the time and location of a given measurement backward in time in order to quantify the sensitivity of that measurement to upwind surface emissions. Particle ensembles are initialized at the time, latitude, longitude, and altitude of each measurement, and transported backwards in time at hourly time steps, with transport probabilities inferred from the NAM-12 meteorology fields.

We run STILT once for each of the 15,385 SO2F2 observations used in this study, and each simulation is run 10 days back in time to ensure that the trajectories reach the edge of the regional modeling domain. We then integrate these back-trajectories over time and volume to calculate surface influence footprints (units: ppt SO2F2 per unit emission flux), which quantify the influence of grid-scale surface fluxes on each SO2F2 measurement.

Each footprint defines the sensitivity of an individual measurement to emissions at different locations (throughout the modeling domain) and times (up to 10 days prior to the measurement). Footprints have units of ppt SO2F2 per pmol m−2 s−1 (mole fraction SO2F2 per unit surface flux), gridded at 0.25 × 0.25 latitude-longitude resolution. We estimate SO2F2 emissions at a daily time scale in this study, so we calculate footprints at a daily time resolution (yielding 10 daily footprints for each SO2F2 measurement). These daily footprints are then used as an input in the GIM.

Geostatistical Inverse Model

We utilize a geostatistical inverse model (GIM) to infer surface fluxes of SO2F2 (s) from atmospheric measurements (z)28,30,36. For additional examples of studies that use GIMs to estimate trace gas emissions, see the following references30,31,32,34,36,77,78,79.

The NOAA GGGRN measurements of SO2F2 are the principal data constraint in our model. These observations z (dimensions n × 1) are modeled as:

$${{{{{{{\boldsymbol{z}}}}}}}}={{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}+{{{{{{{\boldsymbol{\varepsilon }}}}}}}}$$
(1)

In our notation here, we use bold fonts to denote matrices, and bold italicized font to denote vectors. In Eq. (1), H (dimensions n × m) is a sensitivity matrix describing correlations between measured SO2F2 mole fractions and modeled surface emissions (i.e. Hi,j = ∂zi/∂sj), defined by the STILT footprints (discussed above and in the main text). s (dimensions m × 1) are the unknown surface emissions for each model grid box on each day of the time series (m = NdaysNlatNlon). The variable ε (dimensions n × 1) is a vector of the model-data residuals, which include the aggregate effects of measurement error, transport error, representation error, and aggregation error. By construction, ε is distributed as a multivariate normal distibution \({{{{{{{\mathcal{N}}}}}}}}(0,{{{{{{{\bf{R}}}}}}}})\) with a mean of zero and covariances defined by the matrix R.

The covariance matrix R (dimensions n × n) describes the expected magnitude of model-data errors due to measurement, transport, representation, and aggregation.

$${{{{{{{\bf{R}}}}}}}}=E\, \left[{{{{{{{\boldsymbol{\varepsilon }}}}}}}}\right]=E\left[\left({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}\right){\left({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}\right)}^{T}\right]$$
(2)

Here, E[] indicates the expected value of a variable and T indicates the matrix transpose. In general, R can be thought of as a control on how precisely Hs must match the atmospheric measurements z. Notably, there are tradeoffs associated with different configurations of R. For instance, using a value for σR that is too small runs the risk of overfitting the data, while using a value for σR that is too large can lead to model estimates that do not adequately reproduce the measurements. In this study, we construct R as a diagonal matrix, with diagonal elements \({\sigma }_{R}^{2}\). The off-diagonal elements of R describe spatial and temporal covariances in the emissions, which we assume to be zero in this study since SO2F2 emissions are transient and arise rapidly from non-stationary point sources58,61.

We run case studies using both a uniform value for the diagonal elements of R and a non-uniform setup where the diagonal elements vary by measurement site, proportional to the observed variance in the enhancements at each site80. The results of this sensitivity study are shown in Supplementary Note 6 (Supplementary Table 3 and Supplementary Figs. 15–19), and demonstrate that the resulting emissions pattern is not very sensitive to whether R is constructed as a scalar matrix defined by a single parameter or as a diagonal matrix with elements that vary by measurement site. Using an R matrix with covariances that vary by measurement site often results in a more realistic quantification of uncertainties associated with different measurement sites, but the tradeoff is that doing so risks over-parameterizing the GIM. We include these case studies to show that our inverse modeling results are not highly dependent on the definition of R used in the inversion results presented in this study.

The unknown surface emissions s are modeled as the sum of a deterministic term (Xβ) and a stochastic term ξ (dimensions m × 1)36,81:

$${{{{{{{\boldsymbol{s}}}}}}}}={{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}+{{{{{{{\boldsymbol{\xi }}}}}}}}$$
(3)

The deterministic term is defined by the matrix X (dimensions m × p), which includes p predictor variables, or covariate datasets, that help describe spatial patterns in the estimated fluxes, along with unknown corresponding weights β (dimensions p × 1). The deterministic term can be thought of as a weighted linear combination of different spatial datasets that define our inversion prior. The stochastic term ξ, by contrast, describes grid-scale patterns in the modeled fluxes that are not adequately described by the explanatory variables in X. By construction, ξ is distributed as a multivariate normal distribution \({{{{{{{\mathcal{N}}}}}}}}(0,{{{{{{{\bf{Q}}}}}}}})\) with a mean of zero and covariance matrix Q (dimensions m × m). Both ξ and β, and therefore s, are not known a priori and optimized in the inversion.

Similarly to R, the flux deviation matrix Q characterizes how much the estimated emissions s deviate from the deterministic model:

$${{{{{{{\bf{Q}}}}}}}}=E\, \left[{{{{{{{\boldsymbol{\xi }}}}}}}}\right]=E\left[\left({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}\right){\left({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}\right)}^{T}\right]$$
(4)

We assume that deviations of s from the deterministic model are uncorrelated, and so Q can be constructed as a diagonal matrix with diagonal elements \({\sigma }_{Q}^{2}\). The off-diagonal elements of Q represent spatial covariances in the emissions, which we assume to be zero in this study since SO2F2 emissions events typically occur rapidly from non-stationary point sources58,61.

The geostatistical approach to solving the inverse problem requires a formulation of the maximum-likelihood posterior emissions distribution. Following Bayes’ Theorem on conditional probabilities, the posterior probability distribution of the unknown surface emissions conditional on the atmospheric measurements \(p\left({{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}| {{{{{{{\boldsymbol{z}}}}}}}}\right)\) can be written as

$$p\left({{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}| {{{{{{{\boldsymbol{z}}}}}}}}\right)\propto p\left({{{{{{{\boldsymbol{z}}}}}}}}| {{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}\right)p\left({{{{{{{\boldsymbol{s}}}}}}}}| {{{{{{{\boldsymbol{\beta }}}}}}}}\right)p\left({{{{{{{\boldsymbol{\beta }}}}}}}}\right)$$
(5)

The symbol  here means “is proportional to.” The first term in Eq. (5), the probability distribution of the atmospheric measurements conditional on the surface emissions \((p\left({{{{{{{\boldsymbol{z}}}}}}}}| {{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}\right))\), can be derived from Eqs. (1) and (2):

$$p\left({{{{{{{\boldsymbol{z}}}}}}}}| {{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}\right)\propto \exp \left(-\frac{1}{2}{\left({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}\right)}^{T}{{{{{{{{\bf{R}}}}}}}}}^{-1}\left({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}\right)\right)$$
(6)

The second term in Eq. (5), the probability distribution of the surface emissions conditional on the coefficients \((p\left({{{{{{{\boldsymbol{s}}}}}}}}| {{{{{{{\boldsymbol{\beta }}}}}}}}\right))\), can be derived from Eqs. (3) and (4):

$$p\left({{{{{{{\boldsymbol{s}}}}}}}}| {{{{{{{\boldsymbol{\beta }}}}}}}}\right)\propto \exp \left(-\frac{1}{2}{\left({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}\right)}^{T}{{{{{{{{\bf{Q}}}}}}}}}^{-1}\left({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}\right)\right)$$
(7)

As in previous GIM studies, we assume the probability distribution of β is uniform across all values:

$$p\left({{{{{{{\boldsymbol{\beta }}}}}}}}\right)\propto 1$$
(8)

Thus, Eq. (5) can be rewritten as

$$p\left({{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}| {{{{{{{\boldsymbol{z}}}}}}}}\right)\propto \exp \left(-\frac{1}{2}{\left({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}\right)}^{T}{{{{{{{{\bf{R}}}}}}}}}^{-1}\left({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}}\right)-\frac{1}{2}{\left({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}\right)}^{T}{{{{{{{{\bf{Q}}}}}}}}}^{-1}\left({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}\right)\right)$$
(9)

The best estimate of the unknown emissions \(\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) can be obtained by maximizing the posterior probability distribution \(p\left({{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}| {{{{{{{\boldsymbol{z}}}}}}}}\right)\) in Eq. (9). Equivalently, one can minimize the negative logarithm of \(p\left({{{{{{{\boldsymbol{s}}}}}}}},{{{{{{{\boldsymbol{\beta }}}}}}}}| {{{{{{{\boldsymbol{z}}}}}}}}\right)\), which yields the cost function Ls,β30,36:

$${L}_{s,\beta }=\frac{1}{2}{({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}})}^{T}{{{{{{{{\bf{R}}}}}}}}}^{-1}({{{{{{{\boldsymbol{z}}}}}}}}-{{{{{{{\bf{H}}}}}}}}{{{{{{{\boldsymbol{s}}}}}}}})+\frac{1}{2}{({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}})}^{T}{{{{{{{{\bf{Q}}}}}}}}}^{-1}({{{{{{{\boldsymbol{s}}}}}}}}-{{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}})$$
(10)

The best estimate of the modeled fluxes \(\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) and corresponding regression coefficients \(\widehat{{{{{{{{\boldsymbol{\beta }}}}}}}}}\) are obtained by minimizing Eq. (10). We minimize the cost function Ls,β by taking its derivative with respect to s and β and setting each derivative equal to zero. Numerous methods have been utilized for obtaining \(\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) computationally, but in this study we follow the direct approach described by Miller et al. (2020) and preceding work30,36,82. Instead of solving the linear equations for \(\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) and \(\hat{{{{{{{{\boldsymbol{\beta }}}}}}}}}\) directly, we can simplify these equations by introducing an unknown vector of weights ζ (dimensions n × 1):

$$\hat{{{{{{{{\boldsymbol{s}}}}}}}}}={{{{{{{\bf{X}}}}}}}}{{{{{{{\boldsymbol{\beta }}}}}}}}+{{{{{{{\bf{Q}}}}}}}}{{{{{{{{\bf{H}}}}}}}}}^{T}{{{{{{{\boldsymbol{\zeta }}}}}}}}$$
(11)

The unknown vector ζ and the unknown regression coefficients β can then be obtained simultaneously by solving the following linear system of equations:

$$\left[\begin{array}{cc}{{{{{{{\bf{H}}}}}}}}{{{{{{{\bf{Q}}}}}}}}{{{{{{{{\bf{H}}}}}}}}}^{T}+{{{{{{{\bf{R}}}}}}}}&{{{{{{{\bf{H}}}}}}}}{{{{{{{\bf{X}}}}}}}}\\ {\left({{{{{{{\bf{HX}}}}}}}}\right)}^{T}&0\end{array}\right]\left[\begin{array}{c}{{{{{{{\boldsymbol{\zeta }}}}}}}}\\ {{{{{{{\boldsymbol{\beta }}}}}}}}\end{array}\right]=\left[\begin{array}{c}{{{{{{{\boldsymbol{z}}}}}}}}\\ {{{{{{{\bf{0}}}}}}}}\end{array}\right]$$
(12)

After solving for β and ζ, one can compute \(\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) directly by substituting these terms into Eq. (11).

Computing model errors

We follow the methodology described in Kitanidis et al. (1996) for calculating posterior uncertainties (\({{{{{{{{\bf{V}}}}}}}}}_{\hat{{{{{{{{\boldsymbol{s}}}}}}}}}}\)) corresponding to the best estimate solution to the geostatistical inverse problem (\(\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\))29,83.

$${{{{{{{{\bf{V}}}}}}}}}_{\hat{{{{{{{{\boldsymbol{s}}}}}}}}}}={{{{{{{\bf{Q}}}}}}}}-{({{{{{{{\bf{HQ}}}}}}}})}^{T}{{{{{{{\bf{P}}}}}}}}({{{{{{{\bf{HQ}}}}}}}})-{{{{{{{{\bf{XBX}}}}}}}}}^{T}-{{{{{{{{\bf{XA}}}}}}}}}^{T}({{{{{{{\bf{HQ}}}}}}}})-{{{{{{{{\bf{HQ}}}}}}}}}^{T}{{{{{{{{\bf{AX}}}}}}}}}^{T}$$
(13)

The unknown matrices P, A, and B are all computed by inverting the following matrix, as described in Kitanidis et al. (1996)29:

$${\left[\begin{array}{cc}{{{{{{{\bf{H}}}}}}}}{{{{{{{\bf{Q}}}}}}}}{{{{{{{{\bf{H}}}}}}}}}^{T}+{{{{{{{\bf{R}}}}}}}}&{{{{{{{\bf{H}}}}}}}}{{{{{{{\bf{X}}}}}}}}\\ {\left({{{{{{{\bf{HX}}}}}}}}\right)}^{T}&0\end{array}\right]}^{-1}=\left[\begin{array}{cc}{{{{{{{\bf{P}}}}}}}}&{{{{{{{\bf{A}}}}}}}}\\ {{{{{{{{\bf{A}}}}}}}}}^{T}&{{{{{{{\bf{B}}}}}}}}\end{array}\right]$$
(14)

After solving for P, A, and B, the posterior uncertainties \({{{{{{{{\bf{V}}}}}}}}}_{\hat{{{{{{{{\boldsymbol{s}}}}}}}}}}\) can be computed directly using Eq. (13). As the daily grid-scale errors can be quite large, we report emissions and corresponding uncertainties summed over a multi-year time interval and over large geographic regions. Previous work has shown that uncertainties decrease exponentially when aggregated over time and space34,35. Thus, in this work we report emissions and uncertainties for three large regions of the U.S.: (1) California only, (2) All continental U.S. states except California, and (3) All continental U.S. states. With respect to time, we report emissions and uncertainties as multi-year averages (2015–2017 for inversions including LAC tower data, and 2015–2019 for inversions including MWO data).