## Main

Estimating the distribution of air pollutants over space and time is a crucial challenge concerning climate change and human health. In urban environments, the air pollution generated by vehicle emissions has become increasingly evident, to the point that the temporary interruption of regular traffic during the COVID-19 lockdowns resulted in a tremendous decrease in CO2 emission1,2,3,4. Even if this brief period’s impact on the epochal challenge of climate change is negligible5, it helps to outline the impact of emissions related to transportation on our everyday life. Greenhouse gas (GHG) emissions from this sector have doubled since 1970, and, in 2016, 11.9% of global GHG emissions were from road transport (60% of which was from passenger travel)6,7. Moreover, the transport sector emits non-CO2 pollutants such as nitrogen oxides, ozone, particulate matter and volatile organic compounds, which play a fundamental role in changing climate and are dangerous for human health6. Among the Sustainable Development Goals to be reached by 2030 (ref. 8), the United Nations posed an urgent call for action to reduce “the adverse per capita environmental impact of cities, including by paying particular attention to air quality”8. In this regard, measuring vehicle emissions is primary to designing policies to reduce transportation emissions.

Based on the available data, existing methods to quantify vehicle emissions range between two extremes. On the one hand, some approaches rely on measurements performed on small samples of vehicles (usually less than ten) but with high spatiotemporal resolution, such as those coming from particulate sensors9 or portable emissions measurement systems (PEMS)10,11. These sensors measure emissions in real-world driving conditions, producing accurate estimates, but they are hardly generalizable patterns due to the limited sample size. For example, two studies10,11 analysed emissions from PEMS of one and three light-duty vehicles, finding that the highest emissions are associated with the urban part of their routes, flat roads and low speed.

On the other hand, some studies cover a region’s almost entire fleet, for example, using odometer readings obtained from annual safety inspections. These inspections provide data on the age, fuel type, engine volume as well as the distance travelled for each vehicle, and are used in macroscopic models to estimate annual emissions. For example, two studies12,13 used odometer readings to compute mean annual emissions for UK postcode areas and explored the built-environment effects (for example, work accessibility) on the annual kilometres travelled by vehicles in Boston. Unfortunately, odometer readings miss critical information such as instantaneous speed and acceleration14,15,16,17, making it challenging to track emissions over time and map them to suburban areas.

Global Positioning System (GPS) traces generated by in-vehicle devices stand as a trade-off between these two extremes. Depending on the provider’s market penetration, they can cover a representative fraction of the vehicle fleet18 and allow the instantaneous speed and acceleration to be computed, which are then used within microscopic models to obtain emissions estimates with high spatiotemporal resolution. GPS traces describe human mobility in great detail19,20,21,22 and offer an unprecedented tool to implement strategies such as reducing congestion, improving vehicle efficiency and shifting to lower-carbon options23,24,25,26,27,28,29. Given these peculiarities, several studies used GPS traces to analyse vehicle emissions at different spatiotemporal scales30,31, investigate the relationship between emissions and the urban environment32, vehicle kilometres travelled and fuel consumption33, or trip rates and travel mode choice34. Other studies concentrated on congestion-related emissions35 or braking36, emissions associated with ride-hailing37 and bus stop positioning38, the impact of urban policies39, methods for emission modelling40,41 and air quality monitoring9.

Despite this variety of literature, it remains unclear what statistical patterns characterize the distribution of emissions per vehicle and road, how these distributions change in time and space, and how we can exploit this information to simulate emission reduction scenarios. For example, although it is reported that the distribution of emissions from on-road remote sensing sites across vehicles is skewed42,43,44, this finding has been questioned given the inherent limitations of this type of measurement45.

In this study, we analysed the estimated emissions of several air pollutants from thousands of private vehicles moving in different European cities. We used trajectories produced by onboard GPS devices to compute the vehicle emissions and matched the obtained emissions to the cities’ road networks. We then studied how the emissions distribute across vehicles and roads to discover the statistical patterns that characterize emissions and investigated the relationships between emissions, human mobility and the road network’s characteristics. Finally, we simulated two emission reduction scenarios in which a share of vehicles become zero-emission or limit their mobility, identifying strategies to drastically reduce emissions over a city while minimizing the share of vehicles targeted.

Our framework, which applies to any city provided the availability of vehicle GPS trajectories and road network data, may provide practical support for decision-makers to implement strategies to reduce emissions, improve citizens’ well-being and design more sustainable cities46,47,48.

## Results

### Computation of emissions

We used anonymous GPS trajectories describing 423,018 trips from 16,715 private light-duty vehicles moving in Greater London, Rome and Florence throughout January 2017 (Table 1). The spatiotemporal patterns of the vehicle trajectories are stable across the cities and the seasons of the year (Supplementary Note 1).

The trajectories were produced by onboard GPS devices, which automatically turn on when the vehicle starts, transmitting a point every minute to the server via a General Packet Radio Service connection18,19,20. When the vehicle stops, no points are logged or sent. The GPS traces are collected by a company that provides a data collection service for insurance companies. The market penetration of this service is variable, but, in general, it covers at least 2% of the total registered vehicles, and it is representative of the overall number of vehicles circulating in a city18. Figure 1a shows a sample of trajectories for 20 vehicles in Rome.

We defined a methodological framework to compute vehicle emissions from their raw GPS trajectories (Extended Data Fig. 1). We filtered the GPS trajectories so that the time between consecutive points is below a certain threshold (see Methods and Supplementary Note 2). For each vehicle, we estimated the instantaneous speed and acceleration at each point of its trajectory and filtered out points with unrealistic values (see Methods and Fig. 1b). We used a nearest-neighbour algorithm to assign the points to the cities’ roads based on the road networks downloaded from OpenStreetMap49 (see Methods).

The three cities are heterogeneous in their road networks: Rome is large, but with the sparsest network; London is huge, but with the densest network; Florence is small (~1/12 of Rome and ~1/15 of London in terms of land area), but with a dense road network (see Supplementary Note 3 for details).

We employed a microscopic emissions model30 that uses speed, acceleration and fuel type to estimate the instantaneous vehicle emissions of CO2, nitrogen oxides (NOx), particulate matter (PM) and volatile organic compounds (VOC; see Methods). Finally, we computed each vehicle’s overall emissions as the sum of all its instantaneous emissions during the period of study. Analogously, we computed the overall amount of air pollutants on each road by summing all the instantaneous emissions from any vehicle passing along that road during the same period (Fig. 1c).

### Patterns of emissions

We found, for all three cities, that the emissions were distributed across vehicles in a heterogeneous way: a few vehicles, which we call gross polluters, were responsible for a tremendous amount of the emissions. At the same time, most vehicles emitted considerably less (Fig. 2). The distribution of emissions per vehicle is associated with a Gini coefficient higher than 0.55, for all the cities and pollutants (Supplementary Note 4). In line with previous studies42,43,44, we found that the top 10% of gross polluters in Florence, Rome and London were responsible for 47.5, 50.5 and 38.5% of the total CO2 emitted during the month, respectively. The distributions of CO2 emissions per vehicle of Rome and Florence are well approximated by a truncated power law with probability density function: p(x) xαeλx, with parameters α = 1.13 and λ = 1.04 × 10−3 for Rome (Fig. 2e), and parameters α = 2.12 and λ = 1.45 × 10−3 for Florence (Fig. 2h), where e represents the natural logarithm. Similarly, London’s distribution is well approximated by a stretched exponential with probability density function: $$p(x)\propto {x}^{\beta -1}{{\mathrm{e}}}^{-\lambda {x}^{\beta }}$$, with parameters λ = 5.7 × 10−4 and β = 1.26 (Fig. 2b). These results are consistent with those we obtained for the other three pollutants (NOx, PM and VOC): a truncated power law approximates well the distribution for Rome and Florence, and a stretched exponential approximates well the distribution for London (see Supplementary Notes 4 and 5 for details).

The picture is similar when considering the distribution of emissions per road: a few grossly polluted roads suffered from a substantial quantity of emissions, most of the roads suffered substantially fewer emissions. The distributions for all the cities and pollutants are associated with a Gini coefficient higher than 0.64 (Supplementary Note 4), and are well approximated by a truncated power law, with exponents α = 1.55 and λ = 1.08 × 10−4 for Rome (Fig. 2f), α = 1.52 and λ = 1.30 × 10−4 for Florence (Fig. 2i), and α = 2.59 and λ = 2.88 × 10−4 for London (Fig. 2c). Both exponents α and λ are higher for London than for Rome and Florence, denoting a more even distribution of emissions per road (Supplementary Fig. 13). In Florence and Rome, the top 10% of grossly polluted roads are associated with more than 90% of the CO2 emitted during the period. In London, this quantity is lower (56.7%), but still more than half of the city’s total emissions of CO2. Again, we found similar results for the other pollutants (Supplementary Note 4).

The above results held when changing the year’s season (Supplementary Note 1). Also, the sample size of the dataset and the choice of filtering parameter θ did not affect the significance of our results: the shape of the distributions held even if we substantially reduced the sample size or changed θ (Supplementary Note 6).

### Relationship with mobility and road features

To investigate the relationship between a vehicle’s emissions and mobility patterns, we computed Spearman’s correlation coefficient between the emissions and three mobility metrics (see Methods): the radius of gyration, indicating the characteristic distance travelled by an individual50,51, the mobility entropy52,53,54, characterizing the predictability of their visitation patterns, and the total travel time of the vehicles, a principal factor governing emissions. The travel time shows positive correlations, with the strength of the correlations varying from city to city. The radius correlates positively with a vehicle’s emissions, whereas the entropy correlates negatively. In London, the travel time has a strong positive correlation (0.98) with the emissions (Table 2), the radius has an almost null correlation (0.09) and the entropy has a strong negative correlation (−0.72). In Rome, the strength of the correlation with travel time is high (0.8), and the radius (0.58) and entropy (−0.54) are positively and negatively correlated with the emissions, respectively. In Florence, the correlation coefficient of the travel time is reduced (0.48), and those of the radius and entropy show similar behaviour to Rome: the first is positive (0.30), the second is negative (−0.29).

As could be expected, the more a vehicle travels, the more emissions it produces. However, the vehicles with more regular and predictable behaviour generate the highest emissions, not those with more erratic behaviour. Indeed, mobility entropy is low when a vehicle performs a high number of recurring trips, indicating predictable travelling patterns. In contrast, it is high when the vehicle performs trips from various origins and destinations, denoting a more unpredictable travelling behaviour. The observed negative correlations suggest that gross polluters are more regular and predictable than low-emitting vehicles.

To deepen our understanding of these relationships, we used a generalized additive model55 to express the emissions as a non-linear combination of the three mobility measures (Supplementary Note 7). We found that the radius and entropy contribute in an opposite way to determine a vehicle’s emissions. On the one hand, for Rome and Florence, the greater a vehicle’s radius of gyration, the greater its emissions (Supplementary Figs. 25 and 26). For London, the radius of gyration’s marginal contribution to the emissions is constant for radii of >7 km (Supplementary Fig. 24). On the other hand, for Rome and Florence, the greater a vehicle’s entropy, the lower its emissions (Supplementary Figs. 25 and 26). For London, the negative marginal contribution of the entropy to the emissions only holds for an entropy value of >0.7 (Supplementary Fig. 24). We also performed a cluster analysis to group vehicles based on their radius of gyration, mobility entropy and travel time. We found two clusters, namely the predictable and erratic drivers, and found that the former emit typically more than the latter (Supplementary Note 7 and Supplementary Fig. 27).

The interpretation of these results implies that further analyses are required to garner additional data providing information about the motivations behind each vehicle’s trip (for example, drivers’ mobility diaries). For example, the erratic vehicles could emit less because they are primarily used for sporadic excursions towards unknown locations. In contrast, predictable drivers could be forced to use private vehicles because they live or work in neighbourhoods poorly served by public transportation. Also, the heterogeneity of emissions’ distributions could be related to socioeconomic inequalities and the centre–periphery divide. For example, “new low-density settlements that often take on the character of sprawl and rely exclusively on private transportation” have been developed in Rome56.

### Simulation scenarios

Reducing emissions is becoming increasingly important for cities, and it is crucial to estimate the impact of policies targeting vehicles to reduce their footprint on a city’s environment. We investigated the impact of vehicle electrification on the total amount of emissions and the distribution of those emissions across the roads. In particular, we studied how the electrification of a certain share of vehicles would change the emissions on the roads of the three cities. In this setting, even if a vehicle’s electrification were to change its driver’s mobility behaviour, the vehicle would not create any emissions.

We found that the electrification of just the top 1% of gross polluters would reduce emissions as much as electrifying a random 10% of vehicles. In Fig. 3, we show a case study for the entire city of Rome, as well as for a single neighbourhood, to investigate the impact of massive electrification on emissions. The results for London and Florence are provided in Supplementary Note 8. As the number of gross polluters that shift to electric engines grows, the impact on the roads of reducing emissions becomes more evident. In particular, if the top 10% of gross polluters shift to an electric engine, 107 roads would experience a notable reduction in the grams of CO2 per metre (at least equal to 0.01 g m–1; Table 3 and Fig. 3d). In contrast, if 10% of the vehicles that shift to electric engines are chosen at random, only 18 roads would have a substantial reduction of emissions (Fig. 3b). These results hold for both single neighbourhoods and the entire city (see Fig. 3e,f for Rome and Supplementary Note 8 for London and Florence).

The percentage reduction of the overall emissions grows almost linearly when the share of electric vehicles is chosen at random. In contrast, a generalized logistic function (GLF), also known as Richard’s curve57,58, approximates the growth rate when the gross polluters are electrified first. We used non-linear least-squares methods to fit the GLF, which describes the growth of a variable x as $$f(x)=\frac{\alpha }{{(1+\beta {{\mathrm{e}}}^{-rx})}^{1/\nu }}$$, where α represents the upper asymptote, β the growth range, r the growth rate and ν the slope of the curve. The model gives R2 = 0.99 both for the selected neighbourhood (Fig. 3e) and for the whole city of Rome (Fig. 3f). The estimated growth rate r of the curve is 4.84 × 10−2 for the neighbourhood and 3.96 × 10−2 for the entire city of Rome. Its slope ν is almost the same for both the neighbourhood and the entire city (−1.55 and −1.56, respectively). The values of α and β are ~100 and −1, respectively, for both the neighbourhood and the city. Similar results hold for Florence (Supplementary Fig. 31b and Supplementary Table 11). In Greater London (Supplementary Fig. 31a), the growth starts slowly (ν = −0.86): there are fewer vehicles with high emissions levels, and electrifying the most polluting vehicles is slightly less effective in reducing emissions than in the other two cities (Supplementary Table 11).

Given the increasing importance of remote working, especially since the COVID-19 pandemic59,60, we simulated the impact of a massive shift to remote working on reducing vehicle emissions. This working style may affect individual mobility patterns, but we assumed that it eliminates commuting trips. Indeed, if an individual works from home, the most straightforward implication is the removal of commuting trips from their mobility habits. We identified vehicles’ home and work locations (see Methods) and studied the emissions generated from their commuting patterns. We then performed a simulation in which a growing share of these commuters become home workers, that is, they no longer travel between their home and work locations.

We found that emissions reduction is more effective when the home workers are gross polluters. In this case, remote working for the top 1% gross polluters leads to the same reduction as if they were ~4% random vehicles (Supplementary Fig. 32). Again, a GLF fits well the emissions reduction when the gross polluters become home workers. In particular, we obtained estimates for ν (the slope of the curve) that are similar for Rome and Florence (−1.30 and −1.35, respectively) and lower for London (−0.72; see Supplementary Note 8 and Supplementary Fig. 32 for details).

Overall, these results demonstrate that targeting specific profiles of vehicles can substantially improve emission reduction policies.

## Discussion

Using GPS data to estimate the emissions from thousands of vehicles in three European cities of different sizes and characteristics, we have shown here the existence of gross polluters, that is, vehicles responsible for the greatest quantity of emissions. The existence of gross polluters has been reported in previous studies using measurements from on-road remote sensing sites42,43,44; however, these studies have been questioned because measurements from on-road sites cannot represent a vehicle’s overall emission level45. Our study contributes to reshaping this discussion because our findings are based on a microscopic emission model that captures in great detail the instantaneous emissions of vehicles. We have added new elements to this debate, discovering that gross polluters exist in different cities and for different pollutants (CO2, NOx, PM and VOC) and reporting the existence of grossly polluted roads suffering the greatest amount of emissions.

The heterogeneous patterns governing the distribution of emissions across vehicles and roads are well approximated by heavy-tailed distributions, with exponents that vary from city to city and from pollutant to pollutant. These peculiar exponents may depend on the characteristics of the city’s road network and people’s commuting behaviour. For example, London has a vast and dense road network, and people use private vehicles less intensively61 than in Rome and Florence. Thus, one can argue that mobility behaviour in London leads to a distribution of emissions per vehicle that is more even than in Rome, which is characterized by a vast and sparse road network and intensive use of private vehicles62.

Our study can be reproduced with any city provided the availability of vehicle GPS trajectories and road network data, and may help to find more effective strategies to reduce emissions. For example, our study demonstrates that blocking the circulation based on an uninformed choice (for example, blocking vehicles with odd or even number plates) has less impact on reducing emissions than identifying and targeting a small share of gross polluters. Moreover, we have designed a precise model to estimate the overall reduction of emissions caused by the electrification of a particular share of vehicles or by reducing the number of commuting trips travelled by the vehicles (for example, caused by a transition to the home working of their drivers).

There are several directions in which this study can be extended. For example, because we focused on light-duty vehicles, all the results we have shown are valid for this fleet of vehicles only. Although they make up the vast majority of vehicles circulating in a city, we are aware that the absence of other vehicles, such as heavy-duty vehicles (for example, buses and trucks), generates an incomplete mosaic of the emissions within the urban environment. We hope, therefore, for a more comprehensive study that may include different types of vehicles.

Also, our analysis can be extended by investigating how the emission patterns vary between weekdays and weekends or weather conditions, and by considering more sophisticated simulation scenarios. For example, it would be interesting to investigate the impact of policies that aim to improve walking, transit or cycling on the distribution of emissions, the number of gross polluters and grossly polluted roads. Finally, the relationship between vehicle emissions and mobility patterns may be examined in more depth to investigate whether the observed heterogeneous distributions originate from other inequalities (for example, socioeconomic inequalities and the centre–periphery divide).

Meanwhile, our study may shape the discussion on measuring emissions with digital data and how to use such measurements to simulate emission reduction scenarios. If we learn how to use such a resource, we have the potential to monitor in real time the level of emissions in our urban environments and take immediate, informed actions when they overcome a certain tolerance threshold. This fact is crucial because the decisions that policymakers take depend on what we measure, how good our measurements are and how promptly we react to these measurements.

## Methods

### Data filtering

In our GPS dataset, each trajectory point is associated with a vehicle identifier, a trajectory identifier, a timestamp and a latitude and longitude pair. The sampling rate of the trajectory points may affect the estimate of instantaneous speed and acceleration. Because the mean time interval between trajectory points varies from city to city (it is about 1 min for London and 4 min for Rome and Florence, see Table 1), we performed a pre-processing step to align them. For each trajectory, we retained only those subtrajectories (that is, disjoint subsets of points) that satisfy two constraints: (1) there are at least two points and (2) the time interval between consecutive points is less than θ = 120 s. The filtering step causes a drop in the number of points and, in consequence, of vehicles. We analysed the trends in the number of both vehicles and points resulting from the filtering step, varying the filtering parameter θ from 1 to 300 s (Supplementary Note 1). By choosing θ = 120 s, we lost 53.5% of the points in Rome, 1.4% in London and 70.4% in Florence. Consequently, we discarded 1.3% of the vehicles in Rome, 0.9% in London and 6.5% in Florence (Table 1).

In previous work, different values of θ were used (for example, the time interval between the points was set to 1 s (ref. 36), 3 s (ref. 37), 5 s (ref. 30) and 5–50 s (ref. 31)). Our choice derives from our data sampling rate and is a trade-off between the reliability of the results and the data coverage. As the last step, we computed, for each vehicle, the speed and acceleration at each point, and retained only those points with a speed of less than 300 km h–1 and an acceleration in the range −10 to +10 m s2 (as suggested by Nyhan et al.30).

### Computing emissions

We implemented a microscopic emissions model30 to compute the instantaneous emissions associated with each trajectory point p. We denote the quantity of pollutant j {CO2, NOx, PM, VOC} emitted at point p from vehicle u as $${E}_{p}^{j,u}$$, and the instantaneous speed and acceleration of the vehicle at point p as sp and ap, respectively. Information about its engine type (whether it is petrol, diesel or liquefied petroleum gas) is available for each vehicle. This information determines, together with the type of pollutant, the emission factors fi. We used the following equation to compute the instantaneous emissions $${E}_{p}^{j,u}$$ of pollutant j from vehicle u at point p:

$${E}_{p} ^{\,j,u}={f}_{1} ^{\, j,u}+{f}_{2} ^{\, j,u}{s}_{p}+{f}_{3} ^{\, j,u}{s}_{p}^{2}+{f}_{4} ^{\,j,u}{a}_{p}+{f}_{5} ^{\, j,u}{a}_{p}^{2}+{f}_{6} ^{\, j,u}{s}_{p}{a}_{p}$$
(1)

where for NOx and VOC emissions the factors f1, ..., f6 change with a cceleration (based on whether ap ≥ −0.5 m s2 or ap < −0.5 m s2). We show the variation of factors f1, …, f6 with the vehicle’s fuel type and acceleration in Supplementary Table 13.

### Mobility measures and road centrality

We used three quantities to describe the mobility of a vehicle u:

• The radius of gyration18,19,50 $${r}_{{\mathrm{g}}}(u)=\sqrt{\frac{1}{n}{\sum }_{i\in P}{\mathrm{dist}}{[{{{{\bf{r}}}}}_{i}(u)-{{{{\bf{r}}}}}_{{\mathrm{cm}}}(u)]}^{2}}$$, where P is the set of n points recorded for u, ri(u) indicates the coordinates of trajectory point iP, rcm(u) is the centre of mass of u and dist is the haversine distance between two points on earth

• The temporal-uncorrelated entropy52,53,54 $$S(u)=-\mathop{\sum }\nolimits_{i = 1}^{{N}_{u}}{p}_{u}(i){\log }_{2}{p}_{u}(i)$$, where Nu is the number of distinct locations visited by u and pu(i) is the probability that u visits location i

• The travel time of u, computed as the sum of all the travel times of its trajectories

We measured the centrality of a road, a proxy of its traffic volume in the city, as its betweenness centrality. In network science, the betweenness centrality of an edge e (that is, a road in our case) is defined as $${C}_{{\mathrm{b}}}(e)={\sum }_{s,t\in V}\frac{\sigma (s,t| e)}{\sigma (s,t)}$$, where V is the set of nodes in the network, σ(s,t) is the number of shortest paths between s and t, and σ(s,te) is the number of those shortest paths passing through edge e.

### Home and work locations

The first step when identifying an individual’s home and work locations is the selection of the starting and ending points of their trajectories65. The position of the starting (or ending) points of the trajectories that start from (or end at) the same semantic location may not coincide. This may happen (1) because a driver may park the vehicle within a certain radius of the location and (2) because the first point sent by the GPS device often lacks precision and is discarded, the second point sent is taken as the starting point of the trajectory. For the above reasons, we spatially clustered these points within a radius of 250 m and took the centroid of each cluster as the vehicle’s stop location.

To identify a vehicle’s home and work locations, we used a principle commonly adopted in the literature65: the home location is the stop location corresponding to the most frequent cluster, and the work location is the stop location corresponding to the second most frequent cluster. We discarded the vehicles for which it is impossible to identify the most frequent stop location(s) (for example, the vehicle visited each location only once). We successfully identified home and work locations for 55, 31 and 16% of the vehicles moving in London, Rome and Florence, respectively. There are two main reasons we cannot identify the home and work locations of many vehicles in Florence. First, the average number of trajectories per vehicle (10.7) is much lower than in the other two cities (27.6 in Rome and 43 in London; see Table 1). The fewer trajectories a vehicle has, the more difficult it is to identify its home and work locations. Second, while a relatively small city, Florence is an essential hub for the surroundings. Thus, many users could live outside the city. Moreover, even if they live inside the city, given various restricted traffic areas in the city’s historic centre, many could reach the workplace by public transportation or walking. This leads to a low number of commuting trajectories inside the city.

### Reporting summary

Further information on research design is available in the Nature Research Reporting summary linked to this article.