Introduction

Nitrogen dioxide (NO2) is a reactive, short-lived atmospheric trace gas which is formed both naturally from lightning and emitted by anthropogenic activities generating heat in the atmosphere such as fossil fuel use and biomass burning (BB)1,2. NO2 reacts in the atmosphere with OH and clouds to form nitrate aerosol, an important fraction of PM2.5, as well as an important agent in coating and removing black carbon (BC) from the atmosphere3. NO2 also has a very short lifetime in the atmosphere, rapidly exchanging with NO as defined by the pseudo steady-state assumption4. Therefore, in general works look at the total sum of NO and NO2, hereafter referred to as NOx. Both the ratio of NO2 to NO as well as the total loading of NOx, in combination with heat, ultraviolet radiation, and either carbon monoxide (CO) or volatile organic compounds (VOCs) leads to the production of ozone (O3)5. Therefore, determining the emissions, and in-situ processes impacting both NO2 and NOx are essential for understanding their atmospheric distribution as well as their impact on multiple atmospheric, environmental, and climatological phenomena6,7.

The column loading of NO2 can be readily measured by analyzing backscattered blue sunlight in combination with differential optical absorption spectroscopy on the space-based downward looking platform OMI8. The patterns between these absorption spectra and atmospheric column loadings of NO2 in-situ can be evaluated quantitatively within a degree of uncertainty9. Some recent works have investigated the relationships from OMI of NO2 directly, with only a very small few venturing to other forms of nitrogen, including a study connecting column NO2 and surface NOx using big data, and a study connecting NO2 and ammonia with a model10,11.

There are many emission inventories currently used by the scientific community that provide emissions of NOx and other co-emitted species such as CO and BC. The first way in which such data sets are derived is using a bottom-up approach, an approach that relies on aggregating economic, population, and other factors, together with emissions factors obtained from idealized cases or events, not the atmosphere in-situ, as is done by EDGAR12,13. Another approach is to scale bottom-up emissions datasets in tandem with remotely sensed measurements of land use change or fire radiative power at higher temporal and spatial frequency, and then couple these with emissions and mass factors based on small-scale studies which are built from the bottom-up, as is done by FINN14. These approaches rely upon a limited number of measurements from the laboratory or limited field experiment15,16, strongly constraining their results based on a prior information. Therefore, these emissions datasets do not adapt well to new or rapidly changing sources, or when the underlying factors impacting emissions change17,18. Furthermore, rapid changes frequently impact the quality or availability of measurements themselves, including changes in albedo, land use, rain and clouds, and high loadings of aerosols from upwind fires, leading to hard-to-scale uncertainties19,20.

Some works have scaled bottom-up emissions changes directly to column loadings from satellites, using a mixture from simple scaling to complex models, and data assimilation techniques, although in general these are not able to improve datasets in which the a priori is zero21,22. In these cases, the top-down results tend to be biased by the spatial and temporal assumptions underlying the a priori emissions and are strongly impacted by mis-characterization of the measurement uncertainty23,24,25. The applications of direct top-down estimation approaches are generally only applied to long-lived gasses (CH4, CFCs, and N2O), since their chemical decay is very slow compared with their transport processes, allowing for any observed perturbations that are larger than the measurement uncertainty to definitely be due to an emissions source, which then can be inverted based on an inversion of the meteorological fields26. There is only a limited set of papers addressing short-lived species, and always under idealized conditions including where there is a strong single point source surrounded by what is otherwise a source-free region27,28 or using an underlying model to interactively approximate the chemical and transport properties of the short-lived species on average over a long period of time and then using these average conditions to make an average inversion over a single geographic region under temporal conditions which are climatologically similar29.

Analyzing spatial and temporal extremes of remotely sensed measurements and attributing the variability in connection with a rigorous analysis of measurement and physical process error has also been used to demonstrate the spatial and temporal extent of missing emissions, albeit under limited conditions30,31. The frequency and spatial distribution of time-varying aerosol sources in Asia was identified using observations from MISR32 and validated by independent measurements from CALIOP, OMI, and MOPITT21,33. A top-down Kalman Filter approach was utilized to estimate global-scale BC emissions28,34. Remotely sensed measurements of CO from MOPITT were used to constrain the temporal and spatial distribution of missing emissions sources, as well as a pseudo-magnitude33. An attempt at improving the representation of missing biomass burning and urban pollution sources at the daily scale was attempted using mathematical techniques of intermediate complexity using both empirical orthogonal functions and plume rise modeling, to determine the variability and vertical distribution of observed extremes of CO and NO2 over a decade-long period20,27. There have also been attempts to constrain the chemical lifetime by making simple assumptions such as gaussian plumes or diffusion-based transport35,36,37. What none of these has been able to accomplish before is a direct calculation of emissions, free from a priori, under highly complex atmospheric, surface, and anthropogenic forcing conditions, with a focus on day-to-day extreme events.

This work utilizes a unique, fast, first-order approach using daily measurements of remotely sensed NO2, winds, and mass-conserving estimates of in-situ chemical and physical processing to estimate the daily NOx emissions of extreme events associated with both biomass burning and rapid changes in urbanization. This paper adds five aspects: first, the process quantifies the sources associated with extreme events on a day-to-day basis (with groups of extreme events ranging from 15 to 50 days); second, the approach uses measurements over heterogeneous geographical regions that include mountains, coasts, urban regions, forests, agricultural land, and various atmospheric phenomena including high cloud cover, high aerosol cover, large temperature gradients, extremes in surface UV radiation, a substantial contribution of both land and sea, and considerable vertical atmospheric mixing; third, the approach is able to capture sizable NOx sources associated with both biomass burning and urbanization, ranging from hyper modern Singapore to least developed countries like Laos; fourth, the approach does not rely on any chemical or climate transport or climate models; and fifth, the approach is not limited to a single season or relatively consistent geographic region. Furthermore, by analyzing the first-order approximations of thermodynamical, chemical, and transport factors, this work presents a holistic and unequivocal approach to quantify both known and missing sources, as well as their uncertainty.

Results

Land surface characteristics over the domain from 5°S to 32°N and 85°E to 130°E, covering most of Continental Southeast Asia (CSA), large parts of Southern and Central China (SCC), Northeast India (NEI), and Bangladesh are given in (Fig. 1). This region is geographically heterogeneous with elevations ranging from over 8000 m to the surface, climate types ranging from third pole to tropical, and vegetation types ranging from tropical rainforest through savanna, as well as substantial ocean cover. Fig. 1 shows the climatological standard deviation of 2016 daily OMI NO2 observations, which has been previously shown to indicate changes in source amount and variability18,38. The highest values are found in existing urban areas including conurbations (Shanghai to Nanjing, Hefei, and Hangzhou [YRD] and Hong Kong to Shenzhen and Guangzhou [PRD]), large urban basins of SCC, Dhaka, Bangkok, Hanoi, Ho Chi Minh City, and Singapore, as well as in rural regions in the mountains of NEI and CSA. This indicates there are substantial changes in NO2 occurring over short periods over areas which are usually polluted as well as over areas which are usually clean, indicating that both urbanization and biomass burning are sizable sources32,39.

Fig. 1: An overview of land-use categories and NO2 observations.
figure 1

Spatial distribution of (a) different land-use categories and (b) climatological standard deviation of daily remotely sensed column NO2 [molecules cm−2].

Spatial and temporal features of extreme conditions

A variance maximization technique has been applied to the daily NO2 column measurements over the entire domain, which has derived orthogonal spatial and temporal patterns contributing to the maximum amount of variation in the field. Emissions and changes in chemistry, transport, and thermodynamics are strongly related to the change in variance, with the stronger signals being easier to decompose and analyze31,32. The four spatial [EOF] and temporal [PC] modes contributing the most variation of the observed daily NO2 fields contribute 18.8%, 8.3%, 7.4%, and 3.5% of the total variability with the spatial patterns given by EOFs in Fig. 2 and the temporal patterns given by PCs in Fig. 2.

Fig. 2: Spatial and temporal features extracted from NO2 observations via variance maximization method.
figure 2

ad Spatial distributions of the four main EOFs and (e) the corresponding time series of the PCs.

The first EOF demonstrates a signal in the urban areas in the PRD, YRD, and SCC, with additional moderate signals in Xiamen, Taizhong, Dhaka, Bangkok, and cities in Eastern India. Additionally, it demonstrates contribution from remote areas in Myanmar, Northern Thailand, and Northwestern Laos (Supplementary Fig. 1)40. The first PC is observed from days 45 to 90, and again from days 300 to 366, with a few individual peaks observed between day 1 and day 30.

The second EOF shows a signal across the biomass burning areas in CSA including Myanmar, Northern Laos, and Northern Thailand, and extending into the remote regions surrounding the PRD, Yunnan, NEI and Northern Bangladesh. There are smaller signals observed in the urban YRD, parts of Southeastern China, the Malaysian Peninsula, Dhaka and Bangkok. The second PC is concentrated from days 60 to 120, and again from days 300 to 366.

The third and fourth EOF are weaker, although still have some regions with notable signal. The third EOF displays an urban signal in the PRD, YRD, Central China, NEI, and Bangladesh, while also showing a weak signal in the biomass burning areas in CSA. The third PC has sufficient magnitude, but only occurs over isolated peaks banded from days 1 to 90 and again from days 300 to 366. The fourth EOF outlines biomass burning areas west of 102°E, including Myanmar, Western Thailand, and some parts of NEI, as well as Laos. The fourth PC has only a few days’ worth of peaks, all occurring from days 60 to 120.

The geographic and temporal features of the EOFs and PCs allow the map to be divided into urban and non-urban (including clean, biomass burning, and mixed) regions as demonstrated in Fig. 1 and Table 1, following variance maximization27,32. For non-urban areas, this work focuses on the mountainous tropical forests, agricultural lands, and densely populated non-urban areas found throughout CSA and NEI, regions with an annually occurring intense wet season and extended dry season during which time most fires will occur or expand41,42.

Table 1 NOx Emission inventory source used in different kinds of regions

The specific days and locations are chosen based on the variance maximization analysis of the underlying OMI NO2 column measurements, so that they cover the spatial-temporal extent of the largest changes in emissions or in-situ processing of NOx as demonstrated in Fig. 2.

The time series of the NO2 weighted column loading over the portions of EOF1 and EOF3 corresponding to BB2 and the portions of EOF1 corresponding to Urban1 and Urban2 (see Fig. 1) are displayed with the respective spatial and temporal cutoffs in Fig. 3. The signal within central Myanmar, Northern Thailand and Northern Laos occurs a total of 30 days between days 60 and 110, while the signal throughout most of Myanmar and Laos occurs a total of 39 days from day 60 to day 120, indicating multiple stable phases of biomass burning occurring in different geographic areas at different times, with there being considerable overlap in Eastern Myanmar and Northern Laos, but not in other regions. This indicates clearly that there are distinct anthropogenic and natural driving forces behind the burning. Similarly, in the urban areas, the NO2 weighted column loading over the PRD and SCC occur over 36 days at random from days 1 to 110 and from day 300 to day 360; over the PRD and parts of Fujian and Taiwan Island the signal occurs over 25 days at random from days 300 to 366 and another 11 days completely at random throughout the remainder of the year.

Fig. 3: Spatial and temporal patterns of typical extreme.
figure 3

Different EOF distributions and PC time series aggregated over (a, b) BB2 areas, and (c, d) Urban1 and Urban2 areas.

The observed weighted NO2 in each BB and urban EOF/PC region at the selected times (in green) is larger than the weighted NO2 over the entire region averaged over the year, with the weighted NO2 enhanced by a factor of 8.7 and 9.5 over EOF1 BB2 and EOF3 BB2 respectively, and enhanced by a factor of 18.5 and 13.5 over EOF1 Urban1 and EOF1 Urban2 respectively. These results consistently and uniquely identify those regions on a day-to-day basis that are highly polluted.

Thermodynamic, chemical, and transport coefficients

Using the filtered spatial and temporal features, the best fit coefficients for thermodynamics of NO and NO2 production (\({{\alpha }}_{1}\)), chemical loss of NOx due to in-situ chemical reactions (\({{\alpha }}_{2}\)), and transport of NOx (\({{\alpha }}_{3}\)) are computed following Eq. (5) (see Supplementary Table 1). \({{\alpha }}_{1}\) is found to range from 1.12 to 17.1 [NOx/NO2], \({{\alpha }}_{2}\) is found to range from 4.3 to 23.8 [hours], and \({{\alpha }}_{3}\) is found to range from −19.2 to 11.9 [number of 0.25ox0.25o grids]. \({{\alpha }}_{3}\) is more frequently negative (54.8% of the grids) consistent with the net flow from higher emissions grids into lower emissions grids, while the reverse is observed in 45.2% of grids, consistent with transport by pressure gradients also playing an essential role. This result is a step beyond current approaches of estimating emissions of NOx, CH4, and CO2 from satellite25,36,43, which limit themselves exclusively to consider flow only from high concentration to low concentration regions.

The ratio of NOx/NO2 in BB areas (1.18 to 17.1) is relatively larger than in urban areas (1.12 to 4.41) indicating relatively more production of NO in BB areas, consistent with the combustion temperature and efficiency in BB areas being lower than combustion found in urban areas (i.e., from power plants, factories, residential use, and transport). In particular, difference between the 25% and 75% value of \({{\alpha }}_{1}\) over BB areas is broader than over urban areas, consistent with a more diverse set of fuel loadings, land-use types, temperatures and types of combustion (burning, flaming, smoldering, etc.), and energy efficiencies based on how biomass is consumed (fuel, heat, light, etc.), consistent with known thermodynamics associated with emission temperature44. While local atmospheric processing will also have an impact on this ratio, given that there is no substantial difference in the climatology between these regions, and the fact that both regions under high pollution conditions frequently have very low to nearly zero ozone concentrations, the net effect of the climate processing is not expected to be vastly different between these regions.

Chemical lifetime in general is slower over BB regions than urban regions. Most of the 25% percentile \({{\alpha }}_{2}\) values over BB areas are more negative than over urban areas, consistent with the higher average OH levels and more rapid oxidization on average in urban areas45. This is further consistent with the fact that ultra-fast non-linear oxidation at the sub-kilometer level will not be represented at grid-scale46 and in this work is assumed to be a function of the emissions following previous studies47,48. One special case is PC2-BB3-Pos, which has a chemical lifetime similar to urban areas, in part because its biomass burning emissions are adjacent to and interact with the urban chemistry associated with Bangkok, Ho Chi Minh City, and rapidly urbanizing regions around Phnom Penh and Tonle Sap (Supplementary Fig. 1)40. A second special case is found in Urban3, which has a wider range (particularly on the faster lifetime side) than Urban2 and Urban1, driven both by substantial local biomass burning sources within the urban airsheds and more intense UV radiation on average to drive photochemistry40,49.

The transport term \({{\alpha }}_{3}\) is based on the atmospheric concentration, wind direction, and orientation between adjacent areas, with values close to zero being inconsequential. First, the transport distances from BB areas are smaller than from urban areas, consistent with both the faster chemical decay in BB regions and the lower surface roughness in urban areas. Second, the magnitude of the outward transport tends to be larger than inward transport, consistent with greater chemical loss occurring when polluted plumes are transported into more polluted regions. There are 4 special regions, three of which are always exporting regions (PC2-BB3-Pos always exports to surrounding urban areas, and PC2-Urban1-Neg and PC1-Urban3-Pos always export to surrounding suburban areas), and one of which is always an importing region (PC1-Urban2-Pos is surrounded by the even more intensely emitting urban and manufacturing centers of Guangzhou, Dongguan, Shenzhen, Xiamen, Xinbei and Taibei, as well as being downwind from biomass burning exported from NEI and the Northern CSA). The special case of BB3 is always negative as its emissions are advected away from the mountains on the West and the East into the central urban areas exclusively on days 1 to 90 with PC1-Urban3-Pos, which consequently is the same time as the high-speed west-to-east wind induced by the transition in the Asian Monsoon50. The special case of PC3-Urban3-Neg demonstrates a mostly positive environment which occurs infrequently from days 30 to 90 when the biomass burning emissions are high, but otherwise randomly throughout the year consistent with a blocking type of meteorological environment near the surface39,41. Consequentially, some of the regions with the greatest amount of import and export may been dependent on meteorological and climate factors driven far from the source regions51.

NOx emissions

The daily emissions computed on days 51 and 362 have large urban sources of NOx emissions (Fig. 4). It is clearly demonstrated that the geospatial distribution of emissions captures the largest urban areas including a cluster in SCC and YRD, a second cluster in the PRD, Fujian and Taiwan Island, and a third cluster around NEI and Bangladesh. By comparison, the daily emissions computed on days 72 and 119 represent high emissions from BB regions (Fig. 4). The geospatial distribution of emissions clustered in Northern CSA, Southern CSA, and NEI. These results demonstrate both large sources of NOx emissions from Myanmar, Laos and NEI as well as urban regions in the outer parts of the YRD and SCC, although the a priori inventories are very low or non-existent on these days. The ability to identify and quantify emissions occurring at either the wrong time and/or the wrong place, or sources which are heavily impacted by changes in climate or other dynamical effects, is essential for attribution work, since areas with zero emissions cannot be scaled or otherwise easily accounted for48,52. This finding is clearly demonstrated in the case of the Hong Kong Special Administrative Region’s inverted NOx emissions, which matches the well-known west to east high to low emissions gradient (Supplementary Fig. 2), as well as general magnitude (15.3 ± 3.5 kt yr−1 over a total of 48 days (325 ± 75 t d−1), as compared to the Hong Kong Government’s value of 89.7 kt yr−1 (249 t d−1)), even though major emissions data sets (including EDGAR and MEIC) do not have emissions data for Hong Kong in 2016.

Fig. 4: NOx Emissions of four typical days.
figure 4

Daily NOx Emissions computed on four distinctly characteristic days (ad): Days 51, 72, 119, and 362.

The annual mean and normalized standard deviation of the extreme event emissions calculated in this work and the merged EDGAR and FINN emissions are given in Fig. 5. The net a priori emissions show a wider geospatial coverage due to the 22.3% of the domain identified herein that does not have an OMI column NO2 signal which changes enough in at least one of either space or time to be identified using this approach. Over the regions identified, the a priori emissions on average are a factor of 7.5 times lower than the mean, with considerable variability occurring in different regions. In the rapidly developing urban areas of Dhaka, Bangkok, Ho Chi Minh, Manila, Kuala Lumpur, and Hanoi, the emissions are 11 times higher than the a priori, consistent with rapid increases in energy use and economic development and biomass burning in upwind areas. The mean emissions in the PRD is a factor of 4.2 higher than the a priori, consistent with the rapid growth in most of the region outside of the core areas of Guangzhou, Foshan, Dongguan, Hong Kong, and Shenzhen, as observed in both the amount and geospatial distribution of factories and transport sources as well as a broadening of the residential density (both a magnitude and an extension of geographic extent of emissions). This finding is consistent with the ongoing changes to rapidly increase the development of the PRD into an integrated economic region inclusive of Hong Kong and Macau53. The mean emissions in the biomass burning areas of Myanmar, Northern Laos, and Northern Thailand is higher than the a priori emission by a factor of 2.5, consistent with increased forest fires. Inland urbanizing regions in China, including Wuhan, Changsha and Nanchang, have mean emissions 2.1 times higher than the a priori, consistent with the overall push by China to increase economic expansion and development in central regions54. However, the most highly developed coastal cities of the region, Shanghai, Suzhou, Hangzhou, Fuzhou, Xiamen, Taibei, and Singapore, have the smallest increase in mean emissions from the a priori, being a relatively smaller factor of 1.8 higher, consistent with higher quality and less rapid growth, as well as better a priori knowledge55. The normalized standard deviation of emissions in Northern Laos, most of Myanmar, Northern Thailand, NEI, and Cambodia is a factor of 0.87 lower than the normalized standard deviation of the a priori inventories (Fig. 5), consistent with the fact that there are both 11 days of emissions covering 5.7% of the domain which this approach has identified but are missed by FINN.

Fig. 5: Statistics of daily emissions over the EOF regions defined during different specific days.
figure 5

a, b are days defined by the PCs (extreme conditions), (c, d) are days defined by FINN (biomass burning), and (e, f) are days given by EDGAR (entire year). a, c, e represent the mean [kg m−2 s−1] and b, d, f the ratio of standard deviation to mean respectively of NOx emissions.

Over the entirety of the domain, the day-to-day statistics of NOx emissions are given in Fig. 6. Due to low variability, the overall EDGAR a priori NOx emission over the EOF regions has an average value of 3.2 × 10−10 kg m−2 s−1, compared with the average value over the EDGAR domain of 2.4 × 10−10 kg m−2 s−1. The FINN emission over the EOF regions has an average of 1.4 × 10−10 kg m−2 s−1, with a high value from day 1 to day 130 and a low value otherwise, compared with the average value over the FINN domain of 3.3 × 10−11 kg m−2 s−1. This work’s NOx emissions exhibit a larger mean and expanded variability, with [mean, daily variation, and uncertainty] respectively of [1.8 × 10−9, 1.6 × 10−9, 5.1 × 10−10] kg m−2 s−1 and [1.4 × 10−9, 1.1 × 10−9, 5.4 × 10−10] kg m−2 s−1 in urban and biomass burning areas. This work’s emissions are particularly high in urban areas from days 1 to 35, days 50 to 125, and days 280 to 366 with the [mean, daily variation, and uncertainty] respectively [1.9 × 10−9, 1.7 × 10−9, 5.0 × 10−10] kg m−2 s−1, corresponding to when there is decreased UV radiation and slower chemical loss, as well as increased coal combustion for heating and end-of-year factory production. The emissions are particularly low from days 125 to 280 with [mean, daily variation, and uncertainty] respectively [9.3 × 10−10, 3.4 × 10−10, 5.3 × 10−10] kg m−2 s−1, corresponding to more UV radiation and faster atmospheric chemistry, coupled with less power demand required for heating and more available hydropower during the rainy season.

Fig. 6: Time series of estimated and emission inventory-based NOx emissions.
figure 6

a Est Urban, EDGAR Urban (EOF), and EDGAR Urban (Mask) denote the time series of daily mean estimated NOx emissions over urban regions, EDGAR emissions over urban EOF regions and EDGAR emissions over urban mask regions, (b) Est BB, FINN BB (EOF) and FINN BB (Mask) denote time series of daily mean estimated NOx emissions over BB regions, FINN emissions over BB EOF regions and FINN emissions over BB mask regions.

Over BB regions, the differences in emissions are even larger between the high and the low periods, with combustion being the primary difference. The [mean, daily variation, and uncertainty] of NOx emissions over BB areas from days 60 to 150 is [1.9 × 10−9, 6.9 × 10−10, 7.5 × 10−10] kg m−2 s−1 while the respective values are [1.0 × 10−9, 1.2 × 10−9, 3.6 × 10−10] kg m−2 s−1 from days 1 to 60 and days 325 to 366. It is noted that the average NOx emissions are both very high and relatively consistent during the biomass burning period from days 60 to 150, with low day-to-day variability and relatively low error. During the other biomass burning periods, the average is less high, the day-to-day variability is higher, and the error is quite low, indicating that day-to-day variability is driving the biomass burning emissions from days 1 to 60 and from days 325 to 366. This clearly demonstrates that there are both continuous and variable phases of the burning occurring in these regions, consistent with some amount of anthropogenic forcing involved.

It is noted that the uncertainty is only larger than the day-to-day variability in urban areas during very low emissions days in urban areas. The uncertainty is smaller than the day-to-day variability in all other cases (urban areas with medium and high emissions, and all biomass burning areas). The uncertainty is also smaller than the mean value under all conditions, indicating that the results are nearly always statistically relevant.

The computed day-to-day emissions total mean±error over all of the EOF regions and times are found to be 61.0 ± 32.6 kt d−1 from biomass burning in Northern CSA (44.3 kt d−1 more than FINN), 4.0 ± 2.3 kt d−1 from biomass burning in Southern CSA (3.2 kt d−1 more than FINN), 14.3 ± 6.1 kt d−1 from urbanization in China (5.0 kt d−1 more than EDGAR), and 5.1 ± 3.2 kt d−1 from urbanization in NEI and Bangladesh (3.7 kt d−1 more than EDGAR). The net NOx emissions is 88.2 kt d−1, compared with the 29.4 kt d−1 sum of FINN and EDGAR over the same region.

There is also a quantified connection between the spatial-temporal distribution of NOx emissions and the land-use type (Fig. 7). Cropland areas have emissions that are lower (65.6% of emissions under 1.0 × 10−9, 15.2% of emissions between 1.0 × 10−9 and 2.0 × 10−9, and 19.2% between 2.0 × 10−9 and 5 × 10−9), consistent with biomass burning being carefully controlled to clear new land for future agriculture and to clean out organic rubbish. Savanna regions have emissions that are higher than cropland emissions (39.7% of emissions under 1.0 × 10−9, 33.1% of emissions between 1.0 × 10−9 and 2.0 × 10−9, and 24.4% between 2.0 × 10−9 and 5 × 10−9), consistent with these drier regions being more prone to accidental fires and spread, as well as some amount of peat burning. Broadleaf regions have the highest and most variable emissions of all land-use types (27.3% of emissions under 1.0 × 10−9, 31.5% of emissions between 1.0 × 10−9 and 2.0 × 10−9, and 37.8% between 2.0 × 10−9 and 5 × 10−9), consistent with very hot fires when large trees burn, spread due to upwind/downwind slope effects, burning in remote areas being hard to control, and active control when clearing land for new agriculture at the edges of existing land, or combustion of underbrush to support commercial trees such as wood, rubber, or palm, etc. In addition, there are further differences in similar land-use types as a function of policy, with differences in emissions observed within the same land use type on opposite sides of national boarders in: NEI and Bangladesh, NEI and Myanmar, the triangular region between Myanmar, Laos and Thailand, Thailand and Cambodia, and Cambodia and Vietnam.

Fig. 7: Relations between land cover types and spatial distribution of NOx emission.
figure 7

a Probability density function of annual mean daily emission over the (b) geolocations of three different aggregated land cover types observed within the net BB areas.

These results can be applied to finer resolution satellite observations, such as TROPOMI NO2 observations and to longer time series from merged OMI and GOME NO2 observations, to obtain improved emissions in terms of spatial and temporal coverage. The method can be enhanced to account for new advances with respect to uncertainties of NO2 retrievals, chemical methods, dynamics, and transport. The method has limitations based on satellite retrieval issues due to high cloud cover and the limited coverage of polar orbiting satellites. Furthermore, there are additional uncertainties associated with changes in the land surface due to burning and urbanization, changes in co-emitted BC aerosols leading to additional changes in the retrieved NO2 signal, uncertainties associated with the vertical profile of the emissions, uncertainties associated with hourly and other high frequency extreme events, and other extreme events which are not sensitive enough to be picked up. More access to ground measurements, the next-generation geostationary satellites and improved knowledge and/or observations of the changes in the climate itself, could be used to address many of these issues. Finally, it is hoped that with more cooperation across the emissions community, that the improved emissions estimations in this would lead to a next generation of a priori emission databases, which in turn would iteratively assist the next generation of results using this approach.

Sensitivity of NOx emissions

Two sensitivity runs are performed to examine how the uncertainty in OMI observations may impact the final emissions results. The first run scales all OMI measurements by −40% [herein called Base-40%] while the second run scales all OMI measurements by +40% [herein called Base+40%] compared to the default run [herein called Base]. The reason for this selection is that ±40% is considered the largest and smallest possible uncertainty in the retrieved OMI NO2 column value found in the literature56,57,58,59,60,61,62,63. Both of these values consider issues including additional absorption by BC above what the a priori model driving the air mass factor computes64,65,66, differences in the plume rise heights from the underlying models (which tend to be less over the regions studied here, although may be more over other regions of the world)27,67,68, and upward looking observations from MAX-DOAS58. The point of these runs is not to comprehensively diagnose the uncertainties, but instead to put a maximum and minimum value bound on them, allowing a minimum and maximum value of the emissions to be quantified, and on grids and during days when this is larger than the signal, the user can then choose to include or discard them.

For each case, a new set of emissions is computed following the same procedures and using the same reanalysis and a priori emissions data. As demonstrated in Fig. 8, the differences on a grid-by-grid and day-by-day basis in urban areas are unbiased within the top 85% of data, while the differences on both bases in biomass burning areas are unbiased within the top 67% of data. On a grid-by-grid basis, the difference between Base-40% and Base is always less negative than −1.49 × 10−9 kg m−2 s−1, while the difference between Base+40% and Base is always smaller than 0.33 × 10−9 kg m−2 s−1. On a day-by-day basis the differences in urban areas are generally larger than the differences in biomass burning areas, where specifically the difference in urban areas between Base-40% and Base is always less negative than −2.31 × 10−9 kg m−2 s−1 and the difference between Base+40% and Base is always smaller than 2.44 × 10−9 kg m−2 s−1, and the difference in biomass burning areas between Base-40% and Base is always less negative than −1.17 × 10−9 kg m−2 s−1 and the difference between Base+40% and Base is always smaller than 0.13 × 10−9 kg m−2 s−1. It is important to note that the total number of grids in the three cases Base+40%, Base, and Base-40% are not the same (Supplementary Tables 13), and therefore all comparisons are being made over the regions of respective overlap.

Fig. 8: Spatial and temporal comparisons within the sensitivity analysis.
figure 8

Grid-by-grid and day-by-day differences of emissions sensitivity analysis: (a) Base-40% – Base, (b) Base+40% – Base, and (c) time series of day-by-day emission and the respective maximum and minimum values corresponding to the cases Base+40% and Base-40% over urban areas (red solid circle) and BB areas (blue asterisk), respectively.

The daily and yearly emission and their differences over different regions and total areas are further investigated as shown in Table 2. The ratio of daily and yearly differences between Base-40% and Base to the Base case over the total area are found respectively to be −22% and −23.6%, so are 9% and 10.9% for the differences between Base+40% and Base to the Base case. In all cases the differences are smaller than the 40% changes imposed on the OMI NO2 column loadings, confirming that the approach is robust. A point of interest is that in general the difference between Base+40% and Base yields a much smaller magnitude in emissions than the difference between Base-40% and Base, especially in biomass burning areas. In all cases the low mean and variability in the a priori emissions data, particularly so in biomass burning areas, coupled with constraints placed on the ranges of α1 and α2, lead to a net larger constraint on the computed emissions. It is clear from this sensitivity study that the emissions computed in this work may be too low, although they are already higher than the a priori, especially in biomass burning regions.

Table 2 Summary of emissions over five aggregated regions on a day-to-day basis (top half) and a PCA integrated annual basis (bottom half)

The a priori emissions mean plus variability is generally so low that it does not offer enough data to constrain α1 and α2 in a physically reasonable way, in particular over biomass burning regions. This result is clearly demonstrated in terms of the decreased number of fit points per EOF/PCA due to the physical constraints in terms of pairing in the Base+40% case as compared to standard emissions, as well as the slight increase in the number of fit points in the Base-40% case (Supplementary Tables 13). As can be observed, the values which are observed in the Base+40% case for α1 are shifted very low while the values of α2 are shifted very high, particularly in the biomass case, which are both consistent with colder air, as is expected from a larger fraction of the emissions being rapidly emitted at height. In the urban regions, the majority of the difference in case Base+40% is observed in the increase of α3, which is more consistent with missing suburban sources or small industries located away from city centers, which is consistent with the fact that new sources are rapidly changing due to rapid economic growth and expansion, especially into regions which may not be as well regulated. The physical constraints supplied by the solution space of the three driving coefficients needing to be realistic applies additional constraints on the calculated emissions on a grid-by-grid and day-by-day basis. In fact, there is a number of grids for which OMI NO2 data is available in each of these cases which are ultimately not considered in the emissions calculation, with the number of grids being discarded slightly different under each case (Base+40%, Base, and Base-40%), with the specific numbers and resulting products described in Supplementary Tables 13.

As observed, the emissions a priori seem very low compared to the inverted emissions, possibly requiring an iterative process between the top-down results given here and bottom-up processes done by others. Perhaps in a step-by-step manner, top-down and bottom-up communities can be used to improve each other. This is especially true since there are underlying non-linear physical issues, including but not limited to: rapid vertical lofting of emitted NOx from both biomass burning and upslope/downslope effects, vast differences in scattered and absorbed UV radiation, and retrieval issues related to the introduction of new sources identified in regions previously assumed to have a value of zero.

Methods

Selected spatial and temporal domain of study

El Niño, a natural anomaly effecting the climate throughout East Asia, South Asia, and CSA, was particularly active from January to November 2016, resulting in major changes in equatorial water temperatures, air temperatures, and precipitation throughout most of the region studied in this work50,54. Furthermore, in 2016 the global distribution of the Indian Ocean Dipole and the North Atlantic Oscillation were also observed in CSA, with a similar set of effects. These net effects resulted in increased land-surface dryness, decreased atmospheric wet removal, stronger winds, and lower cloudiness, all leading to more drought and likely higher than normal emissions50. In fact, the period in January and February 2016 was the second hottest on record during the modern era, with the present year 2024 being the hottest69. For these reasons, it is expected that extremes of NOx emissions should be relatively higher in 2016, presenting an excellent test of the methods in this paper, as well as providing more insight into how emissions will behave under the effects of global climate change over the next few decades. Therefore, the total year of 2016 is selected as the research period.

OMI NO2 dataset

NASA launched Aura on July 15, 2004, in a sun-synchronous, near polar (98.2-degree inclination) orbit 705 km above the Earth. OMI is a key instrument onboard Aura, capable of observing solar backscattered radiation utilizing hyperspectral imaging from the visible and ultraviolet parts of the spectrum8, allowing detection of O3, NO2, SO2, and aerosols. OMI provides a high resolution (13 × 24 km2 at nadir) imaging capability, capable of tracking column loadings of NO2 pollution at the scale of urban centers and large biomass burning sources. This work uses Level-3 daily global gridded NO2 product with a 0.25 × 0.25° resolution, including both cloud clearing and validation (Cloud Fraction<30%). This product has a known reliability of approximately 1.0 × 1015 molecules cm−2 70. All retrieved values which do not pass the quality assurance, are too low to be reliable, or are otherwise missing are not considered further in this work. While this may lead to an underestimation in the total coverage, interpolation has also been shown to present its own errors and additional uncertainties11.

Supplementary Fig. 3 shows the daily measured climatological average of all observations of NO2 in 2016. A higher value appears over known urban and biomass burning regions, including Southern China, NEI and parts of Southeast Asia such as Thailand, Laos and Vietnam. Large urban areas which have a higher level of economic development, mature industries, and very large populations are particularly pronounced, including the YRD and the PRD. Other rapidly developing large urban areas such as Bangkok, Dhaka, Wuhan, Chongqing, Chengdu, Xiamen, Changsha, Nanchang, and Hanoi, and highly developed but smaller urban areas such as Singapore and Taibei, are also clearly observed.

There are two important aspects of OMI NO2 column retrieval bias56,57. The major issues discussed include co-absorption of the NO2 by BC (which is a function of the SSA of the BC and the height of the NO2 and BC) and the general underestimation of total columns as retrieved from satellite with respect to direct sun retrievals made from surface platforms. Given that the SSA in the regions of interest in this paper are observed to be very low (from 0.84 to 0.91 as measured by AERONET71,72) and the vertical height tends to be relatively high (from 1000 m to 3000 m)73.

Land cover dataset

The MODIS land cover product includes annually computed land cover classifications using Terra and Aqua MODIS data in connection with a decision-tree classification method. Three different land cover classification schemes are applied to derive the Leaf Area Index (LAI) at a 0.05 × 0.05° spatial resolution. This specific work uses the Version 6 land surface type data product specifically from 2016 based on the measurements of LAI, as displayed in Supplementary Fig. 4. Over the area in this study, the land use types of largest area are water (61%), savannah (16%), evergreen broadleaf forest (11%), and grasses/cereal (7.9%).

To analyze land cover in connection with emissions, the individual categories of land cover use are obtained over the same grid points of the net EOF BB regions. Similar land cover classes are grouped into three larger categories: Croplands corresponding to drier regions which are naturally irrigated or do not require irrigation and intensively used for agriculture including grass, cereal, and shrubs; Savanna which are drier and require intensive artificial irrigation or other intervention to be used for intensive agriculture; and Broadleaf which corresponds to the major forest lands, in particular found in hilly or inland jungle conditions.

Atmospheric reanalysis data

ERA-5 is an atmospheric reanalysis product by ECMWF, providing global-scale atmospheric wind speed and other physical and dynamical products from 1950 to the present, with a horizontal resolution of 0.25°62. The product is based on geophysical model physics including: long-wave radiation, a simplified linearized parametrization scheme for surface processes, ozone, improved land component and ocean waves, all within an all-sky data assimilation framework. The product is widely used in atmospheric models and remote sensing applications, including the AMF calculations underlying the NO2 retrievals used in this work63. This work specifically uses horizontal wind near the surface (975hPa) and at high-altitude (800hPa), to drive the model for urban and biomass burning areas20,33.

A priori emission inventories

FINN is a fire emissions product based on the 1-km level-2 active fire product derived from MODIS TERRA and AURA infrared measurements made in bands: B21(3.929-3.989 μm), B22(3.940-4.001 μm), and B31 (10.780-11.280 μm)74, with the emissions from each NIR/IR plume based on the fuel availability, plume size, and intensity14. Due to coverage gaps between the adjacent orbits on a day-to-day basis, cloudiness, and optically thick smoke with an AOD larger than 2.0, there are many missed observations particularly so in the equatorial region between 30°S and 30°N latitudes75. These results have been used in a bottom-up manner with laboratory constrained emission factors and estimated environmental fuel loadings at a scale of 1 × 1 km2 to estimate the emissions of selected trace gasses and aerosols from biomass burning. This product has been used in many common atmospheric chemical transport models and is considered a standard emissions source76. It has been demonstrated that the product faces difficulty in cases of diagnosing small fires or cloud covered fires, which have caused models to suffer difficulty reconciling the actual environment with this dataset77. For this combination of reasons, this work utilizes FINNv1.5 from 2016 to retrieve the NOx emission for biomass burning areas in Southeast Asia.

The Emission Database for Global Atmospheric Research (EDGAR) is an anthropogenic emission inventory product, computed using a bottom-up technology-based emission factor approach to calculate the emission for countries over the world and consists of greenhouse gases and air pollutants, such as carbon dioxide, methane, CO, NOx, VOC, ammonia, etc13. This work uses the version 2.2 of EDGAR-HTAP, with a global gridded (0.1° × 0.1°) anthropogenic emissions product, at monthly temporal resolution, for the years 2008 and 2010 (there is no current product for 2016). This is also a standard dataset used by many studies in the modeling and impacts communities48,78.

Both emission inventories are interpolated to the corresponding OMI NO2 grids (0.25° × 0.25°) to be used in the following estimation processes shown in Supplementary Fig. 5.

Variance maximization

EOF is a mathematical method that decomposes a dataset in terms of orthogonal basis functions into the factors which contribute the most to the variability of the underlying dataset32. We use this method to extract the spatial and temporal features of the extremes of the remotely sensed NO2 fields. This method has been used in the past for monthly-average climatological AOD, weekly-average climatological CO, and weekly-average climatological NO227,32.

When performing an EOF analysis, the first step is to quantify the most relevant EOFs and PCs as a function of the magnitude of the eigenvalue. The first step to separating urban regions is to identify those regions which both (a) do have known large urban use land types, and (b) contain a large population, or industrial or residential economic usage. To this extent, this section follows the results as generated in Table 1. The cutoffs of the spatial and temporal modes are quantitatively determined employing a recursive cutoff based on the mean and one standard deviation, following the approach in Lin27. First, the weighted average NO2 over the domain (EOF(i,j)*NO2(i,j)) for all values i,j inside the domain is computed using the nine equally spaced percentiles from 10% through 90% of the distribution of EOF(i,j) over the domain. Second, the temporal correlation is calculated between the weighted NO2 and the PC. The combination of lowest possible cutoff value with highest possible absolute value of the R statistic (as good a fit in time with the peaks in the PC) and the lowest possible increase in additional RMSE error for each datapoint added (as good a fit in magnitude with the peaks in the PC) yields the appropriate domain in space. Third, in terms of temporal fit, the appropriate subset of times is obtained from all data in which the PCs fall outside of the mean plus/minus one standard deviation. When the same geographic area is identified in multiple EOFs the time series of these respective modes are different and can be aggregated based on the magnitude of the EOF, so as to use the orthogonality to do a complete reconstruction.

In all cases, this approach guarantees that our spatial-temporal domain contains the most substantial signals of the remotely sensed NO2 fields, which in turn contribute to the maximum changes. These changes are in turn those most responsible for changes in emissions of NOx. These sources which are changing the most are those which are most likely to be mis-diagnosed by current emissions inventories for two major reasons. First, current emissions inventories do not consider day-to-day and other high-frequency variations on a grid-by-grid basis, which this approach makes explicit. Second, some fraction of emissions from biomass burning and regions undergoing rapid urbanization are completely missing in existing emissions inventories20,32, which this work also captures, since it is analyzing the tropospheric column, not merely the surface.

Selection of urban and non-urban areas

Due to the considerably economic and political diversity in CSA, there are many people who live in rapidly changing communities in countries such as Vietnam, Laos, Cambodia, Thailand, Myanmar, Indonesia and Malaysia. Within sizable portions of each of these countries, there still is a large amount of biomass used for cooking and clearing of farmland, with most occurring during the local dry season from mid-February through mid-April. The conditions are climatologically similar, although with a very different economic and political profile in NEI and Bangladesh. Due to the higher level of economic development and more strict government policies, there are fewer people in southern China cooking with biomass. On top of this, there is a rapidly increasing number of automobiles, wide-spread city construction, and even small but frequent wildfires that are observed in the local dry season. Overall, throughout the regions of interest there are sizable sources from both biomass and urban sources, which contribute to time-varying intense air pollution events across many broad air pollutants.

Due to their orthogonal and specific nature, this work utilizes as an a priori the respective NOx emission from FINN for biomass burning locations and EDGAR for urban regions. The total emissions have been distributed into 9 separate groups, as shown in Table 1. There are 3 specific BB areas: BB1, BB2, BB3, three specific urban areas: Urban1, Urban2, Urban3 and three other areas: a Tibetan area, a Mixed area and an Equatorial area. The Mixed Area is the only one which uses data from both FINN and EDGAR, when and where overlap actually occurs.

The region defined as Urban1 area mainly covers the urban regions over provinces along the Yangtze River, such as Sichuan, Chongqing, Guizhou, Yunnan, Guangxi, Hunan, Hubei, Anhui, Jiangxi, Zhejiang, Shanghai and Jiangsu, which are mainly impacted by the subtropical mid-latitude climate. In these areas, cities in the YRD are more developed and have a larger population density than the other cities. The region defined as Urban2 covers the same-latitude urban regions of Guangdong, Fujian, Taiwan Island, and Hong Kong, which are climatologically impacted by sea, mountains, and the Asian Monsoon, and have a high population density and extensive industry. The region defined as Urban3 includes the major cities of CSA including Bangkok, Kuala Lumpur, Singapore, Hanoi, Ho Chi Minh City, Manila, as well as Bangladesh’s Dhaka, and some cities in NEI, like Calcutta. Cities here are impacted by tropical climate and generally have both a large population and high population density, and in general are less developed economically than the former two areas (with the exception of Singapore). The region defined as the Mixed area includes the remaining areas of China not previously mentioned, and not contained within BB2.

The region defined as BB1 covers the BB areas in NEI, Nepal, Bangladesh and Bhutan. The region defined as BB2 covers Southwestern Yunnan Province, a small part of NEI and most of Northern CSA, including Myanmar, Northern Thailand, Laos and Northern Vietnam. The region defined as BB3 covers the other areas of CSA except for Peninsular Malaysia and Singapore. These BB areas are mainly impacted by a Tropical monsoon climate, including both the Indian Monsoon and the Asian Monsoon.

The region defined as the Tibetan Area mainly includes the Tibet and the mountains of Western Sichuan, as well as some nearby high-altitudes regions. Overall, this area has a very low population density, a low level of atmospheric pollution, and a sizable fraction of their emissions coming from wild fires. The Mixed Area has a combination of small to medium cities, mountains and forests, and therefore has characteristics of the anthropogenic and biomass burning types. The Equatorial Area includes the islands around the equator in the Maritime Continent except for Singapore (Indonesia, Malaysia and Brunei) as well as the Philippines. This area has a high occurrence of wild fires each year in August through October, and a large amount of rapidly developing and highly dense urban areas. However, this region is also heavily impacted by rain, and therefore does not have many days of measurements available.

Mass conserving equation

The basic mass-conserving equation for a tracer in the atmosphere (in this case NOx) is shown in Eq. (1). The terms include the time rate of change between the previous day’s and current day’s loading of NOx, \(\frac{\partial ({{{{{{\rm{V}}}}}}}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}})}{\partial t}\), the emissions of NOx, \({E}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}\), the loss of NOx due to chemical decay, \({{Loss}}_{{{NO}}_{x}}\), and the two transport terms of NOx corresponding to advective transport and pressure-transport, \(\nabla \left(\bar{{{{{{\bf{u}}}}}}}\cdot {V}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}\right)\), where \(\bar{{{{{{\boldsymbol{u}}}}}}}\) denotes the horizontal wind vector (in terms of both u and v) and V denotes the tropospheric column as the previous studies5. Three simplifications are then made to allow the solution to be readily solved within the context of the available measurements from OMI. First, there is a linearization between NOx and NO2, \({\alpha }_{1}\), whereby the ratio of NOx/NO2 in Eq. (2), is based on the thermodynamics of the NOx formation in the flame1. Second, the loss term is linearized in Eq. (3), where \({\alpha }_{2}\) is the rate of reaction times the concentration of OH, \({C}_{{OH}}\), responsible for the major conversion of atmospheric NOx into nitric acid79. The third simplification as given in Eq. (4), \({\alpha }_{3}\) denotes the weighted distance of the horizontal grid over which the transport of NOx occurs. Overall, these terms are merged together in Eq. (5). In this equation, the grids missing OMI NO2 data will be discarded, only the grids with OMI NO2 observations will be used. The a priori emission inventories will also be used after interpolation to the same resolution (0.25°x0.25°) based on the OMI NO2 product.

$$\frac{\partial ({V}_{{{{{{{\rm{NO}}}}}}}_{x}})}{\partial t}={E}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}-{{{{{{\rm{Loss}}}}}}}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}+\nabla \left(\bar{{{{{{\bf{u}}}}}}}\cdot {V}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}\right)$$
(1)
$$\frac{\partial ({V}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}})}{\partial t}=\frac{d({\alpha }_{1}{V}_{{{{{{{\rm{NO}}}}}}}_{2}})}{{dt}}{=\alpha }_{1}\frac{d{V}_{{{{{{{\rm{NO}}}}}}}_{2}}}{{dt}}$$
(2)
$${{{{{{\rm{Loss}}}}}}}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}\approx {{\alpha }_{2}}^{{\prime} }* {C}_{{{{{{\rm{OH}}}}}}}* {V}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}={{\alpha }_{2}}^{{\prime} }* {C}_{{{{{{\rm{OH}}}}}}}* {\alpha }_{1}* {V}_{{{{{{{\rm{NO}}}}}}}_{2}}={\alpha }_{2}{\alpha }_{1}{V}_{{{{{{{\rm{NO}}}}}}}_{2}}$$
(3)
$$\nabla \left(\bar{{{{{{\bf{u}}}}}}}\cdot {V}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}\right)={\alpha }_{3}\nabla \left(\bar{{{{{{\bf{u}}}}}}}{\cdot \alpha }_{1}{V}_{{{{{{{\rm{NO}}}}}}}_{2}}\right)={\alpha }_{3}{\alpha }_{1}\nabla \left(\bar{{{{{{\bf{u}}}}}}}{\cdot V}_{{{{{{{\rm{NO}}}}}}}_{2}}\right)$$
(4)
$${\alpha }_{1}\frac{d{V}_{{{{{{{\rm{NO}}}}}}}_{2}}}{{dt}}={E}_{{{{{{{\rm{NO}}}}}}}_{{{{{{\rm{x}}}}}}}}-{\alpha }_{2}{\alpha }_{1}{V}_{{{{{{{\rm{NO}}}}}}}_{2}}+{\alpha }_{3}{\alpha }_{1}\nabla \left(\bar{{{{{{\bf{u}}}}}}}{\cdot V}_{{{{{{{\rm{NO}}}}}}}_{2}}\right)$$
(5)

To find the best fit values for \({\alpha }_{1}\), \({\alpha }_{2}\), and \({\alpha }_{3}\), a least square method is used. This is a statistical procedure to find the best fit by minimizing the sum of the squares of the residuals80, as given by Eq. (6), where \(f\left({x}_{i}\right)\) is based on Eq. (5), and \({y}_{i}\) is given based on the measured a priori emission value. In this work, a prior is given by the sum of the NOx emissions from FINN and EDGAR.

$${{{{{\rm{min}}}}}}\left\{{{{\rm{f}}}}\left(x\right)={\sum }_{i=1}^{m}{\left[{y}_{i}-f\left({x}_{i}\right)\right]}^{2}\right\}$$
(6)

The fits of \({\alpha }_{1}\), \({\alpha }_{2}\), and \({\alpha }_{3}\), and values of computed weighted NOx over the EOF fields in comparison to the PCs are further explained using correlation, the rmse, and simple probability density functions (PDF) and analysis.

Bootstrapping

Bootstrapping is a statistical technique which utilizes random sampling with repeatability of replacement, specifically capable of handling all types of PDFs. This method is capable of being computed on a laptop or desktop using commonly available software, in this work MATLAB was used. This method can be widely applied to estimate the variation of statistics (bias, variance, confidence intervals, etc.)81. In this work, this method has been used to sample the values of \({\alpha }_{1}\), \({\alpha }_{2}\), and \({\alpha }_{3}\), which are then in turn used to compute different permutations of emissions from the already constrained PDFs of the coefficient values and their associated uncertainty ranges. The uncertainty ranges are computed specifically by computing half of the difference between the 25th-percentile and 75th-percentile results of the bootstrap distribution on a pixel-by-pixel and day-by-day basis. These numbers are then combined in space and time via the root-mean-square techniques to form a region-by-region uncertainty on a day-by-day and year-by-year basis. Finally, the mean and standard deviation of these derived individually computed emissions are finalized over each specific sub-domain, each month, and then finally merged to cover the entire domains of interest, in a bottom-up manner.

Emissions calculation

Based on the workflow chart of NOx estimation presented in Supplementary Fig. 5, daily emissions of NOx are calculated throughout the entire year using Eq. (5) once the values of \({{{{{{\rm{\alpha }}}}}}}_{1}\), \({{{{{{\rm{\alpha }}}}}}}_{2}\), and \({{{{{{\rm{\alpha }}}}}}}_{3}\) as derived and bootstrapping has been undertaken. Spatially averaged emissions are calculated over the extracted respective urban and EOF BB regions at the corresponding time periods. Masked NOx emissions are calculated over the default urban and BB areas shown in Fig. 1, without time constraints.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.