A big data approach to improving the vehicle emission inventory in China

Estimating truck emissions accurately would benefit atmospheric research and public health protection. Here, we developed a full-sample enumeration approach TrackATruck to bridge low-frequency but full-size vehicles driving big data to high-resolution emission inventories. Based on 19 billion trajectories, we show how big the emission difference could be using different approaches: 99% variation coefficients on regional total (including 31% emissions from non-local trucks), and ± as large as 15 times on individual counties. Even if total amounts are set the same, the emissions on primary cargo routes were underestimated in the former by a multiple of 2–10 using aggregated approaches. Time allocation proxies are generated, indicating the importance of day-to-day estimation because the variation reached 26-fold. Low emission zone policy reduced emissions in the zone, but raised emissions in upwind areas in Beijing's case. Comprehensive measures should be considered, e.g. the demand-side optimization. There lacks a method to measure the rapid changes of vehicle emissions. Here the authors proposed a big data approach ‘TrackATruck’, and their estimates using the new approach show that the heavy-duty trucks (HDT) emissions of primary cargo routes/terminals were underestimated by 2–10 times in proxy-based emission inventories.

T he accurate estimation of anthropogenic emissions is the foundation for understanding the interaction between human activities and the atmosphere. Traffic-related sources contributed 23% of CO 2 emissions 1 and 31% of NO x emissions 2 globally and up to 45% of PM 2.5 concentrations in China's megacities [3][4][5] , resulting in air pollution, global warming and ecological deterioration 6,7 . Furthermore, the variation in traffic emission is significant in both the temporal and spatial dimensions, bringing environmental effects of varying and even opposite degrees. For example, both ozone formation and loss were reported due to vehicle emissions in different environments [8][9][10] . Therefore, improving the quality of the traffic emission inventory requires not only accurate estimation of the total emissions but also accurate characterization of their temporal and spatial distribution.
High-quality statistical vehicle activity data and emission factors usually ensure the accuracy of the total amount, but it is more challenging to obtain a detailed distribution 11 . Taking lightduty vehicles (LDVs) as an example, by continuously improving the accuracy of emission factors and using data such as urban registered holdings and local gasoline sales to constrain the total activity level, the uncertainty regarding the total LDV emissions can be controlled to~20% 12 . However, heavy-duty truck (HDT) trips are long-haul and frequently cross municipal boundaries, so the total amount cannot be represented by local vehicle registration or fuel sales. This uncertainty is particularly significant in countries that are rapidly advancing HDT emissions control. China, Russia, South Korea, India, Turkey and Argentina have updated HDT emissions standards two or three times in the past 10 years 13 , and so the update cycle on these standards has thus been shorter than the life cycle of HDT 14 . Therefore, the HDT fleet in these countries contains at least three emission levels (e.g., China has China III, China IV and China V). Some regions, for example, Tokyo, Saitama and Kanagawa in Japan; Mumbai, Kolkata and Chennai in India; and California in the United States, enforce stricter emissions regulations than neighbouring areas. For these regions, the fleet composition of HDT is different than that of other places. In China, Beijing implemented the China III emission standards four years earlier than other provinces. In the following years, Beijing's China III HDT ratio for local HDT was three times higher than that in neighbouring provinces 15 . In summary, due to the rapid updating of emission standards and advanced implementation policies, China's HDT emissions are highly heterogeneous. Thus, the total uncertainty regarding the HDT emission inventory is still large 16,17 , and the distribution caused by HDT movements continues to be difficult to evaluate. Therefore, this study focuses on this characteristic of HDTs and improves the total emissions quantity and distribution data by improving the activity data; the accuracy of emission factors is not within the scope of this study.
The key to improving the accuracy of the HDT activity dataset is converting the aggregate proxies into single descriptions of all samples. Full-sample enumeration has been used in shipping emission inventory development, e.g., the STEAM model 18,19 , but not in vehicle emission estimation. Ships frequently cross-borders, and the areas of pollutant emission are often not coincident with areas where ships are registered. Therefore, it is necessary to trace the ship emission one by one. In contrast, on-road vehicles usually run in a certain range, so the statistics based on the regional on-road vehicle population can basically reflect the emissions, especially for passenger cars. The cross-border degree of freight trucks is between ships and passenger cars. Most vehicle emission studies are based on the reported VKTs or driving behaviours of hundreds or thousands of samples 20,21 . However, if we compare the sample size to the vehicle population, the proportion of samples is usually less than 1%. Expansion from some samples to all samples introduces inestimable errors. Even though for some countries or areas, top-down inventories might use VKTs from vehicle registration data for all registered trucks in a city or province, the cross-boundary movements of trucks will introduce large errors in the actual local VKTs because the distribution of VKTs in multiple regions is unknown. In addition, although the aggregated proxies can reflect the historical regularity of vehicle activity, they are not as able to reveal the impact of irregular changes, e.g., new traffic policies. Since 2017, Beijing has successively implemented policies such as the introduction of a low emission zone (LEZ) 22 for HDTs, the elimination of China III HDTs 23 , and the reduction in on-road bulk cargo transportation. These policies have directly limited the number of HDT activities on roads, which may have caused significant changes in HDT emissions in a short period that cannot be reflected by aggregate proxies.
This study proposes an approach called TrackATruck that has better capabilities in terms of vehicle-to-vehicle emissions evaluation. Using the TrackATruck approach with over 200 billion HDT signals from the BeiDou Navigation Satellite System (BDS) 24 , an emission inventory is established that has higher temporal and spatial resolution and lower uncertainty in the Beijing-Tianjin-Hebei (BTH) region of North China. Significant discrepancies were found between the traditional HDT emission allocation and the actual HDT emission distribution. Thus, universal time allocation parameters (proxies) of HDT emissions are refined based on the emission patterns. The effects of typical policies on HDT emissions control are evaluated. It's found that the control of low emission zone has led to detours which caused the emissions increase in other regions.

Results
Emission inventory in the BTH region. In this study, a Track-ATruck emission model was established using low-precision, fullsize vehicle travels big data to estimate fine-grain vehicle emissions (see Methods). The HDT emission inventory calculated by this method can reflect the influences of real-time driving conditions on every single HDT and the overall emissions. Based on the 200 billion BDS signals of HDTs, a high-resolution emission inventory was established for the BTH region using the Track-ATruck model (Fig. 1). The BTH region, with an area of 218,000 km 2 located in North China, has more than 100 million residents, the Chinese capital and three large ports. Therefore, this region has high demand for freight transportation. The annual PM 2.5 emission of HDTs in the BTH region were 3739 Mg in 2017 and 3869 Mg in 2018, reflecting an increase of 3.5% (Fig. 1a). The annual NO x emission of HDTs were 136,540 Mg in 2017 and 155,107 Mg in 2018, representing an increase of 13.6%. Approximately 31% of NO x emissions were from non-local HDTs in 2017 and 2018. The day-to-day variation in HDT emissions in this region is very substantial (Fig. 1b). In 2018, the minimum NO x emission of HDTs was 20.56 Mg day −1 on 16th Feb. (the first day of the Lunar New Year festival), and the maximum NO x emission of HDTs was 552.31 Mg day −1 on 29th Sept, for a 26fold difference. The China IV HDT is the primary subsector accounting for an average of approximately 53% of total PM 2.5 emissions and 55% of total NO x emissions over a two-year span. High-resolution gridded HDT emissions data in the BTH region were also presented (Fig. 1c). The HDT emissions were highly concentrated on a small number of intercity roads throughout the BTH region. The NO x emission intensity of HDTs on these roads was generally above 5 Mg grid −1 year −1 . For example, sections of the 6th Ring Road, 5th Ring Road, highways from Beijing to Tianjin, and highways from Beijing to Zhangjiakou had high emission intensities (subgraph in Fig. 1c). The HDT emissions in the western mountains of BTH were significantly lower than those in the eastern plains.
Performance assessment of the TrackATruck model. A model validation was conducted considering two aspects. The individual vehicle result validation is introduced in the Methods section as a validation of the method itself. Here, to comprehensively evaluate the performance of the model, both the total amount and the spatial allocation were analysed or compared with those of other inventories. Recent studies with the top-down method have suggested that the PM 2.5 emission of HDTs in BTH are in the range of 2303 to 15,200 Mg year −1 in 2014-2015 25,26 , while those estimated by this study is approximately 3804 Mg year −1 (twoyear average). The difference between previous studies and this study is from −39% to 299%, with a coefficient of variation about 99%. This very large difference comes from the total VKT, fleet composition and emission factors used. Although the single-truck VKT is recorded by an odometer, it is difficult to determine how much of the driving occurred in the BTH region, which may lead to overestimation. On the other hand, omitting the emissions from non-local HDTs driving into the BTH region will result in underestimation of the emissions. Thus, the total VKT is significantly different between these studies. Here, using truck-bytruck big data, this study advances the calculation of the total VKT, the proportion of local/non-local HDT, and the fleet composition (e.g., the proportion of old trucks). Our results show that local HDTs contributed only 69% of emissions for both PM 2.5 and NO x over a span of two years in the region (Fig. 1a).
Several studies in Beijing used traffic-volume data to estimate the total on-road HDT emissions. Due to the uncertainty in distinguishing vehicle technology categories from traffic observations, the values calculated by different studies for HDTs can differ by three orders of magnitude (2.81 Mg year −1 of PM 2.5 to 1189 Mg year −1 of PM 2.5 ) 27,28 . In addition, some studies claimed that they used traffic-volume data to calculate high-resolution emissions inventories, but only the distribution, not the total annual HDT emissions, was provided 29,30 . The PM 2.5 emissions of Beijing's HDTs calculated in this study were 218.76 Mg year −1 (two-year average). Compared with the traffic-volume-based approach, our approach is feasible for both a large area and the city centre. In addition, the vehicle technology category (e.g., China III or IV) usually leads to a large difference in emissions, which cannot be directly identified in the traffic-volume data. This information for each individual vehicle is kept until the emission aggregation stage of the TrackATruck approach; hence, a high-resolution and detailed emission composition can be provided.
The third advantage lies in the spatial distribution of emissions by TrackATruck, which is more in line with the actual characteristics of cargo activities. Figure 2a shows a comparison of the NO x emissions of HDTs of 155 counties in the BTH region calculated by the vehicle-stock-based top-down method with those calculated by the TrackATruck method using the same emission factor. In 79% of counties, the emissions estimated by top-down method are out of the range of −50% to 150% compared to the TrackATruck results, (out of range between the dashed lines in Fig. 2), and 4% of counties which differences are more than 15 times. Further analysis was performed on the two counties (Binhaixinqu and Jingxiuqu) that have the largest differences between the two methods. It was found that based on the top-down inventory method, the emissions of Binhaixinqu  (Fig. 2b, c), Binhaixinqu has not only a much larger area than Jingxiuqu but also multiple highways and a large port (Tianjin port). In contrast, there are few major freight corridors and no other type of transportation hub in Jingxiuqu. From the information on the map, we believe that the emissions calculated by the TrackATruck method can more reasonably reflect the distribution of HDT emissions within the BTH region.
To further identify the ability of the TrackATruck method to carry out spatial allocation, the same total emissions were allocated into 0.01°grids by seven allocation proxy schemes used in previous studies 16,17,31 , including population (M1), road density (M2), population and road density (M3) and VKT weighted road density (M4-1 to M4-4); see details in the Methods. Next, we used Pearson's correlation to evaluate the spatial correlation between 0.01°gridded NO x emissions of the HDTs of these schemes and the results of this study (Fig. 3a). M1 and M3 have the lowest correlation, showing that the distribution of HDT emissions has almost no correlation with the population density. The correlations of M2 (road density) and M4-1 to M4-4 (road-density-related allocation scheme with different weighting factors) to the TrackATruck results are similar but all below 0.5, indicating that none of the proxy schemes perform well. The results show that the spatial correlations are 0.47 (M4-1) and 0.45 (M4-2) and that they are slightly higher than those of M2, indicating that an appropriate VKT allocation weight can slightly improve the accuracy of the spatial allocation of HDT emissions based on the road density. M4-3 and M4-4 are the respective optimization results of M4-1 and M4-2 obtained by Ye's method 32 (see Methods). The spatial correlations after optimization are not significantly improved compared to those before optimization. The reasons for the low correlation between the road-density-based proxy and real-world driving were further analysed by calculating the ratio between the HDT emissions for each 0.01°grid from this study and that from the M4-1 method (Fig. 3b). One reason is that the proxy cannot reflect HDT activity with various influences in the real world. The ratio of the grids around cities and ports with more freight demand is usually greater than 1, and the ratio of some grids is greater than 10 (e.g., Tianjin port, see Fig. 3c). We further checked these grids and found that they are cargo terminals. In these grids, HDT idling or low speed travel will continue over a longer time and cause more emissions in a small area 33 . Ratios smaller than 1 appear in the regions with high road density, which are large cities such as Beijing, Tianjin and Shijiazhuang. Because these cities have a policy for long-haul truck detours, the road-density-based allocation scheme may overestimate the emissions in these regions (Fig. 3c). The other reason is that the aggregation method does not consider the variance in the busyness level for the same road type. For example, Fig. 3d shows that the two highways (Tanggang and Qinbin highways) have significantly different ratios, which indicates that due to the higher HDT traffic volume, the emissions of Qinbin highway in this study are significantly higher than that of Tanggang highway; however, in the M4-1 method, the VKT allocated values in these two highways are similar, and the emissions are not obviously different. Overall, our results show that even with the same total HDT emissions, the distribution on some primary cargo routes/terminals can be underestimated by 2-10 times in proxy-based emission inventories, while emissions on other routes are overestimated.
Proxies for emission allocation. Compared to big-data analysis, the top-down approach has the advantage of being easy to apply. Therefore, we try to extract more general rules to provide parameters to improve the top-down emission inventory. For spatial allocation, our analysis above shows that no simple allocation scheme has a better correlation with big-data results. Therefore, we do not have any recommended scheme for spatial allocation. However, considering the importance of the time pattern in emission inventory studies 34 , we update the proxies of temporal allocation for Beijing and the BTH region (Fig. 4) Largest/smallest values within a 1.5-fold wider range above the 25th/below the 75th percentiles.
Outside value Fig. 4 The temporally allocated proxies at various time scales obtained by using the NO x emission data of HDTs. a Monthly allocation proxy in Beijing. b Monthly allocation proxy in the BTH region. c Weekday allocation proxy in Beijing. Where the n = 683 independent days. d Weekday allocation proxy in the BTH region. Where the n = 683 independent days. e Hourly allocation proxy in Beijing. Where the n = 16464 independent hours. f Hourly allocation proxy in the BTH region. Where the n = 16464 independent hours. Some samples during the special festivals were removed when statistics of weekday/ hourly proxies were conducted, to obtain more general results. *Source data are provided as a Source Data file. According to the policy requirements, most of the China III HDTs are banned. Therefore, the China III HDT emissions in Beijing dropped rapidly after the implementation of the LEZ (Fig. 5a, b). Similarly, all non-local HDTs are subject to LEZ controls; thus, the emissions of non-local HDTs have also been rapidly reduced after the implementation of the LEZ, and the daily emissions thereafter are lower than the levels before the implementation of the policy. This study also analysed the time series of HDT emissions of another city near Beijing but found no similar emissions changes during the same period (e.g., Fig. 1b).
Therefore, we believe that the changes in HDT emissions are unique to Beijing and are likely to be attributed to Beijing's LEZ.
We compared HDT emissions in whole Beijing in the same months before (1st Jan. to 20th Sept. 2017) and after (1st Jan. to   The LEZ prevents a certain group of HDTs from entering urban areas, and the most likely result is that non-banned HDTs fulfil the demand formerly provided by the banned HDTs. Overall, the policy resulted in lower PM 2.5 emissions in Beijing but not lower NO x emissions. The NO x emission factors highly rely on the correct usage of a selective catalyst reduction (SCR) system. Without further reports on real-world SCR use cases, we did not reduce the NO x emission factor for China V HDTs from China IV 36 .
More specifically, LEZ aims to control HDT emissions inside the Beijing's 6th Ring Road (Fig. 6a). Thus, the NO x emissions inside the 6th Ring Road from non-local HDTs decreased from 2128.58 Mg year −1 in 2017 to 1628.11 Mg year −1 in 2018, a decrease of 500.47 Mg. However, the local HDT emissions increased to fill the capacity gaps left by the non-local HDTs (Fig. 6b). The NO x emissions inside the 6th Ring Road from local HDTs increased from 2288. For the intercity HDT emissions, the change in the spatial distribution is due to HDTs detouring around the city due to the LEZ (Fig. 6c). The original route goes through the 6th Ring Road of Beijing, which is short and usually uncongested. However, due to the LEZ restrictions, HDT drivers chose to detour through the three cities west and south of Beijing, causing the NO x emissions along the corresponding routes to increase significantly between the two years (Fig. 6d). The net changes in the alternative route and the original route were 857.72 Mg and −530.07 Mg, respectively. Overall, 79% of the reduction in HDT emissions from the original route occurred in Beijing, while an increase in HDT emissions on the alternative route occurred in the other three cities (Fig. 6e). The alternative routes with increased HDT emissions were located southwest of Beijing. Historical observations show that the more severe haze in Beijing mostly occurs under meteorological conditions dominated by a southwest wind [37][38][39] . Therefore, although the LEZ shifted the HDT emissions far from urban areas, the detours might lead to more NO x emissions occurring in an adverse position and impacting Beijing's air quality standards. Thus, the final air pollution impact of LEZ on Beijing's primary air pollutant dispersion and secondary pollutant formation is complicated and influenced by more than inner-city HDT emissions.

Discussion
The current top-down emission inventories are mainly driven by statistical data, which creates challenges in improving the resolution, and input data are usually delayed by several months. Here, we develop a big-data-driven approach and demonstrate both the capability of the method and the advantages of the highresolution truck trajectory big data. We estimate the day-to-day   HDT emission in the BTH region for 2017 and 2018; our results suggest the importance of including non-local HDT emissions in local emission inventories. Both the total emissions amount and their distribution are better estimated with higher resolution under this approach. Temporal profiles for monthly, daily and hourly emissions allocation were summarized for application in a top-down emission inventory. However, for spatial allocation, our analysis indicates that no simple allocation scheme has good correlation with the big-data results. The TrackATruck system can tolerate different BDS or GPS data sampling frequencies using the similarity calculation module. This study offers novel insights into local policy evaluation and suggests the need for regional joint control strategies. In Beijing, the LEZ promotes the updating of HDT fleets within the restricted area, but the demand for HDTs has not changed. Thus, approximately 96% of the emission reduction benefit from restricted HDTs was offset by an increase in unrestricted vehicles in the restricted area. From a regional perspective, it is difficult to achieve a win-win result on emission control through this restriction for local and surrounding areas, as the long-haul HDT detour caused by the LEZ may increase HDT emissions in neighbouring cities. In Beijing, those neighbouring cities are the upwind regions when severe air pollution occurs. Thus, when considering the long-range transport of air pollutants, the overall air quality benefit is still in doubt. In addition, this seesaw effect creates a new problem of responsibility allocation for HDT emissions control among multiple cities.
Environmental research aimed at protecting public health needs to reflect the high concentration of pollutants in hot spots rather than just the long-term and large-scale average. Schools, hospitals, transportation hubs and so on are more vulnerable to dynamic traffic emissions, but traditional environmental statistics are clearly not enough to protect the population in these areas. Moreover, the coverage density of traffic and environmental monitoring networks is not sufficient to indicate the impact of traffic emissions on each sensitive area. Therefore, it is necessary to introduce traffic big data to build a fine-grained picture of dynamic emissions. However, no large-scale system collects vehicle position data every second, although it is the most suitable data source for emission simulation, so there is a clear need for a bridge method worldwide. Our study establishes a bridge from low-precision but full-size vehicle travel big data to fine emissions. TrackATruck can be promoted in countries and regions where information-related technology and vehicle monitoring systems have been developed. For example, large-scale implementation of vehicle navigation systems run by third parties or manufacturers and other monitoring platforms could be prerequisites for enabling application of this model to a region. These prerequisites exist in many countries, including the United States, Europe, Japan, South Korea, etc. Maintaining personal privacy in this type of data collection is a major cause of public concern, but this problem is not insurmountable, as the current technology is able to erase personal information in the process of collecting data or providing data to researchers. The key is whether the third-party platforms, vehicle manufacturers, or freight companies that own the data are willing to do such additional work, as it is not profitable. However, if incentives or lobbying strategies promoting environmental public welfare can be used to motivate companies, it is possible to obtain vehicle travel big data with personal information erased. This is exactly the process in which this research is realized. If these big data can be used and combined with the TrackATruck method, it can provide more dynamic vehicle emissions data.
Our study also demonstrates how large the difference can be in vehicle emissions estimation when using different approaches. The difference of regional HDT emissions is significant between the several studies. For county-level emissions, the difference in individual counties by different methods can reach more than 15 times. Even if total amounts were set the same, the top-down methods also underestimated HDT emission on primary cargo routes/terminals by a multiple of 2-10 and overestimated those on other routes. The day-to-day emissions revealed that certain special events, including new control policies and holiday effects, may cause dramatic changes in HDT emissions. Even for regions that cannot implement the same method for various reasons, this result is still useful as it can help them estimate the uncertainty of air pollution sources. In addition, more than 200 cities globally have implemented vehicle traffic control to varying degrees 22 . The analysis of Beijing's LEZ in this study can be a reference to formulate more comprehensive vehicle emission control measures in these places.
This study has some uncertainties because of the imperfect input data. First, the BDS big data do not cover 100% of all HDTs at all times. Based on our QA/QC analysis (see Methods), there is still a 30% chance of HDT data loss due to vehicles not being equipped with the BDS or incorrect data transfer. Even for those trucks in the database, shielded BDS signals in mountainous areas and specific fields can lead to the underestimation of HDT emissions in these areas. In addition, the emission factors, which were not improved in this study, still have some uncertainty.

Methods
Data description and QA/QC. The scope of this study is HDT emissions in the BTH region of China. The population distribution data are from the LandScan database. The HDTs defined in this study refer to trucks with a gross weight of more than 12 tons. The GDP, freight volume and new HDT registration of Beijing are from the database of the National Bureau of Statistics of China (NBSC). The source of the BDS data for this study is the Nationwide Road Freight Vehicles Public Supervision and Service platform of SINOIOV (SINOIOV platform). HDT BDS data in the BTH region on the SINOIOV platform were collected, and the time range was 1st Jan. in 2017 to 31st Dec. in 2018. The main function of the SINOIOV platform is to support the digital management of road freight transportation in China. Currently, nearly 6 million trucks have been connected to the SINOIOV platform, covering 96% of medium/HDTs used in the Chinese freight market. However, some trucks did not report data for various reasons, e.g., policy controls, old truck retirement and not reported to the government. Therefore, this study further checked the coverage of HDTs by performing a QA/QC analysis on the raw data. We counted the number of HDTs collected from the SINOIOV platform and compared this number with the HDT registration data in NBSC. The number of HDTs from SINOIOV versus that from the government database is approximately 0.7:1, the difference stemming from the features of these two databases. For the SINOIOV database, only active trucks were registered and could report data. For the government database, both active and rarely used trucks were included. Therefore, we believe 70% is the real ratio of HDTs registered in the government databases that are still active. More than 80% of the trajectories had BDS sampling frequencies higher than 1/30 Hz. Less than 0.01% of HDTs had missing information in terms of the registration place and year, making it impossible to judge their emissions category. Therefore, the related trajectories of these trucks were excluded from the calculation, but they account for less than 0.01% of the total trajectory number.
TrackATruck approach for HDT emissions. TrackATruck is a bottom-up approach that considers the individual differences in trucks and constructs a highresolution emission inventory. Previous studies have shown that the instantaneous emission rate is greatly influenced by the instantaneous driving state, such as the driving speed, acceleration, and vehicle specific power (VSP). Based on the driving speed, acceleration, and VSP, the US EPA MOVES model 40 divides the stable driving states of a vehicle into 23 operating modes (named Opmodes in the following text). The Opmodes in TrackATruck were defined in the same way as in the US EPA MOVES model. Thus, TrackATruck must build a link from low-frequency BDS data (1/30 Hz) to the estimated Opmodes, which are commonly calculated by 1-Hz GPS data 41,42 . The framework used in TrackATruck is shown in Fig. 7.
In the first step, several continuous BDS signals of HDTs were treated as a trajectory of 300-400 s. According to our analysis of the BDS sampling frequency, the trajectory contained approximately 10 BDS signals with a sampling frequency of 1/30 Hz. Each trajectory from the BDS signals of an HDT (referred to as the 1/30-Hz trajectory in the following text) must start from the last BDS record of the previous trajectory.
In the second step, the model estimates the Opmodes distribution of the 1/30trajectory based on the similarity between the 1/30-Hz trajectory and 1-Hz HDT GPS trajectory (called the 1-Hz trajectory in the following text) dataset in the simulated Opmodes and emissions (SOME) model. The SOME model has a database with 1-Hz GPS data, which were derived from the database given in our previous research [43][44][45][46] . In this database, multiple 1-Hz trajectories were selected based on the speed distribution of the target trajectories of the 1/30-Hz trajectory. The condition for selection is that the mean speed of 1-Hz trajectory full-in the upper and lower limits of the 95% confidence interval of the speed distribution of the 1/30-Hz trajectory. Then, the SOME model uses the following equation to calculate the distribution of Opmodes for a 1/30-Hz trajectory based on the selected 1-Hz trajectories: where μ max and μ min are the upper and lower limits of the 95% confidence interval of the speed distribution of the 1/30-Hz trajectory m.ν max and ν min are the upper and lower limits of the 95% confidence interval of the speed distribution of the 1-Hz trajectory n. These limits were calculated by the mean and standard deviation of the speed along the trajectory. R μ is the set of μ max and μ min . R ν is the set of ν max and ν min . The unit for all speed values is m s −1 .
In the third step, the model calculates the emission rates of each trajectory based on the distribution of the Opmodes and the technical parameters of the HDT.  Fig. 7 The TrackATruck framework for the HDT emission calculation used in this study.
Step 1 is to convert HDT BDS data to the trajectories. One trajectory contains multiple continuous BDS data.
Step 2 is to calculate the distribution of the operating modes for each trajectory, and Step 3 is to calculate the emissions for each trajectory. The simulated operating modes and emissions model is used to simulate the distribution of the operating modes by speed distribution of a trajectory and calculate emissions based on the emission rates of the operating modes.
Step 4 is to aggregate that HDT emissions by all HDT trajectories, combining with various maps, e.g. provincial border, to the temporal-spatial characteristics. where E m,j is the emissions of the 1/30-Hz trajectory m and pollutant j. t m is the duration of the 1/30-Hz trajectory m, the unit is s. er i,j is the emission rate of Opmode i and pollutant j, the unit is g s −1 . BIN m,j is the normalized frequency of the simulated Opmode. The emission rates of Opmodes for all HDT categories are estimated based on the HDT emission factors from the emission inventory (EI) guidebook by the Ministry of Ecology and Environment of the People's Republic of China 47 and the portable emission measurement system (PEMS) data of the on-road truck examined in our previous works 45,46 . The EI guidebook provides the recommended HDT emission factors for guiding HDT emission inventories in China (Table 1). To map the HDT emission factors to the emission rates of MOVES's Opmodes, we established the Opmode emission rates for a benchmark HDT model based on our PEMS data. Next, we estimated the Opmode emission rates of other HDT models based on the relationship of emission factors between the benchmark and other HDT models ( Table 1).
The fourth step uses statistics for the HDT emissions based on the emissions of the HDT trajectories and map information to generate time series and maps of HDT emissions. For each HDT trajectory, we allocate the emissions of a trajectory to grids by the time duration.
Model validation. This study validates the second and third steps in the Track-ATruck model with real-world measurements from five HDTs. We use the SOME model to calculate the simulated Opmode distribution of the test trajectories and then calculate the pollutant emissions of the trajectory by the actual and simulated Opmode distributions, respectively. Figure 8 shows the estimated emission dis-tributions in the test data. For NO x and PM 2.5 , the calculated emissions of the two Opmodes are better correlated, and Pearson's correlations are 0.93 and 0.87, respectively. The slopes of the linear fits of the two results are 1.04 (NO x ) and 0.95 (PM 2.5 ), indicating that the average ratio of the two results is close to 1:1. The mean absolute percentage error (MAPE) of the two results is 9.47% (NO x ) and 16.75% (PM 2.5 ), indicating that the relative differences between the two results are nonsignificant. Overall, there are small differences between the simulated Opmodes and the actual Opmodes when calculating the emissions of an HDT trajectory, which indicates that the SOME model can better estimate the emissions under the Opmode-based model based on the speed distribution of the HDT trajectory.
Test of different aggregated proxies for spatial allocation. To compare our high-resolution approach with top-down methods, this study allocates the same total emissions into 0.01˚grids by different proxy schemes. The proxy descriptions for these aggregation methods are shown in Table 2. First, for each scheme, we calculated the proxy's value for each grid, e.g., the population in a 0.01°grid. Second, for each scheme, we converted these values into the allocation weights for each grid based on the total proxy value in the BTH region. Third, we allocated the total HDT emissions to 0.01˚grids based on the allocation weights for each scheme. Finally, we calculated Pearson's correlation of the 0.01˚HDT emissions between the aggregated proxy methods and the BDS big-data method of this study. The proxy descriptions for these aggregation methods are shown in Table 2.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.
x Fig. 8 The correlations of the estimated NO x and PM 2.5 emissions of HDT trajectories between the observed and simulated operating modes. Each point represents that a HDT trajectory. The slops of the dashed lines are 1. *Source data are provided as a Source Data file.  17 Density of road network, multiplied by VKT allocation weights: Highways (52%), National roads (29%), Provincial roads (11%), Other roads (8%) M4-2 16 Density of road network, multiplied by VKT allocation weights: Highways (39%), National roads (25%), Provincial roads (19%), Other roads (17%) M4-3 b Density of road network, multiplied by VKT allocation weights: Highways (58%), National roads (19%), Provincial roads (11%), Other roads (12%) M4-4 Density of road network, multiplied by VKT allocation weights: Highways (52%), National roads (18%), Provincial roads (19%), Other roads (11%) a ei,g is the 0.01˚gridded HDT emissions for pollutant i and grid g. Ei is the total amount of HDT emissions for pollutant i in the BTH region. pg is the number of populations in grid g. rg is the length of the road in grid g. b The VKT allocation weights of M4-3 and M4-4 are the optimized results based on M4-1 and M4-2 by machine learning 32 .

Data availability
The source data underlying Figs. 1a-b, 1a, 3a, 4a-f, 5a-d, 6e and 8 are provided as a source data file.xlsx. The Opmode emission rates, test HDT information in model validation, GDP and freight volume in Beijing, and the number of new HDTs in the BTH region are also provided in the source data file.xlsx. The source data file.xlsx is available online through the permanent repository under an Apache license 2.0 (https://github. com/fanyuandeng/TrackATruck). The GDP, freight volume and new HDT registration in Beijing and BTH region are also available online through NBSC website (http://www. stats.gov.cn/english). The population data used in this study is available on the LandScan website (https://landscan.ornl.gov/landscan-datasets). The road network used in this study is provided by the National Platform for Common Geospatial Information Services (https://www.tianditu.gov.cn), which is available after applying and obtaining permission.

Code availability
The code of TrackATruck model validation and corresponding test data are available online through the permanent repository under an Apache license 2.0 (https://github. com/fanyuandeng/TrackATruck). The R package of optimized method by Ye 32 is available on CRAN of R project (https://cran.r-project.org/web/packages/Rsolnp/index. html) 48 . Other R packages are also mentioned in the permanent repository.