The recent availability of an unprecedented amount of data has made possible quantitative studies of urban systems1,2,3, opening the way to a new Science of Cities. In particular, the discovery of allometric scaling relationships in cities has driven the quantitative research on urban systems in the past years. Indeed, there is a great amount of evidence that different socio-economic indicators in cities, such as the GDP, the crime rate, the number of patents as well as different structural indicators such as the total length of the road network, the urbanized land area, etc., exhibit robust scaling relationships with respect to population4,5,6,7,8,9,10. The existence of these simple scaling relationship hints at the existence of universal processes shared by urban systems and thus at the possibility of modeling cities.

A common trait shared by all complex systems –including cities– is the existence of a large variety of processes occuring over a wide range of time and spatial scales. The main obstacle to the understanding of these systems therefore resides in uncovering the hierarchy of processes and in singling out the few ones which govern their dynamics. Albeit difficult, the hierarchisation of processes is of prime importance. A failure to do so leads to models which are either too complex to give any real insight into the phenomenon, or too simple and abstract to have any resemblance with reality. As a matter of fact, despite numerous attempts5,11,12,13,14,15, a theoretical understanding of many observed empirical regularities in cities is still missing.

In the present study, we show that the spatial structure of the mobility pattern controls the behaviour of many quantities in urban systems. Indeed, cities are not only defined by the spatial organisation of places fulfilling different functions –shops, places of residence, workplaces, etc.– but also by the way individuals move among them. Understanding where people live, where and how they travel within the city thus appears as a necessary step towards a scientific theory of cities.

Although an increasing amount of data about mobility is now available16, we still lack a simple model explaining the dominant mechanisms governing the formation and evolution of mobility patterns. Many factors such as geographical constraints, facilities location and available transportation –to name a few– can impact the mobility and it thus appears as an intricate issue. Here, we tackle the problem of mobility by making simplifying –yet not simple– assumptions, trying to grasp the most important parameters which define the problem. We thus build upon a simple out-of-equilibrium model previously developed17. This model, among other things, accounts for the polycentric transition of cities and gives a prediction for the number of centers as a function of population. We show that this framework allows us to predict the behaviour of many quantities related to mobility and the structure of cities: the scaling with population of the total time wasted in congestion, transport related CO2 emissions, total travelled distance, total lane miles and surface area.

Our results allow us give a quantitative insight into two important debates around urban systems. First, we are able to discuss the benefits of polycentricity and quantify some of its aspects. Then, maybe more importantly, we are able to put into perspective the sustainability of urban systems.


Naive scalings

We start by presenting some naive arguments to estimate the scaling exponents for the area A, the total daily distance driven Ltot and the total lane miles LN. Although these predictions turn out to be wrong, naive scalings are useful as a first approach to the problem as they allow us understand how the different quantities relate to each other.

Surface area

First, we estimate the dependence of the area A of a city on its population P –a long standing problem in the field5. A first crude approach would be to assume that cities evolve in such a way that their population density ρ = P/A remains constant. This assumption straighforwardly implies that the area should scale linearly with population

where λ2 is the average surface occupied by each individual (the assumption of a constant density is then equivalent to the one of a constant average surface per capita).

Total length of roads

We now estimate the total length LN of all the roads within a city. If we consider that the network formed by streets is such that all the nodes (intersections) are connected to their closest neighbour, the typical length of a road segment is given by

where N is the number of intersections. Previous studies of road networks in different regions and over extended time periods18,19, have shown that the number of intersections is proportional to the population size. Therefore, the typical length of a road segment (between two intersections) varies with the population size P as

and the total length of the network LN ~ PℓR should then scale as

Using the naive scaling for the dependence of A on population size given previously in Eq. 1 we finally get

Total daily commuting distance

Individual constraint. We also estimate the total commuting distance Ltot. The first constraint on this distance comes from individuals's limitations and behaviour. We make here the simple assumption that individuals choose their residence and work place such that their total commuting distance is fixed (or at least smaller than a certain value) and equal on average to ℓC. In that case, we simply have

(by constant, we mean independent from the population size of the city).

The city structure constraint. An additional contraint on Ltot is given by the structure of the city8,25. Indeed, the individual commuting distance is also related to the total suface area of the city and the location of activity centers.

If we first assume that the city is monocentric, individuals are all commuting to the same center and the typical commuting distance is controlled by the typical size of the city of order

On the other hand, if we assume that the city is completely decentralized, the typical commuting distance is of order the nearest neighbour distance and we obtain

Comparison of naive scalings with empirical results

The comparison of the naive exponents with the exponents measured on US data is shown in Table 1 (see the Methods section for details about the data). There are important discrepancies, which we discuss in the following.

Table 1 This table displays the value of the exponent governing the behavior with the population P obtained by naive arguments and the value obtained from empirical data. The discrepancies reveal the failure of the naive scaling arguments and the necessity to go further and model mobility patterns. The data used for this table can be found in38,39,40

First, we note that the naive scaling for the surface area A predicts a value of the exponent that is quantitatively –and worse, qualitatively– different from that observed. Indeed, we find that for real cities

with a = 0.85. While the naive argument implies a linear dependence of the surface area A with population, we find a sublinear scaling in the data, which is a qualitatively different behavior (Table 1). This disagreement on this basic quantity will naturally impact the scaling of the other quantities.

The data also show that Ltot/P can be considered reasonably independent from P (with a value of approximately 23 miles for the US, see Fig. 1), in agreement with the individual constraint assumption (Eq. 6). This finding is also in agreement with the results drawn from census data in Germany by20. Although this assumption of a constant distance is simple and verified on the US data, we think that it deserves to be systematically tested on other datasets for other countries and cities.

Figure 1
figure 1

Constant daily driven distance per capita.

(a) daily total driven distance per capita as a function of population for 441 urbanised area in the US in 2010. The data shown in the plot are compatible with a population-independent behaviour. (b) Histogram of the daily total driven distance per capita for the same cities. The average daily driven distance for these cities is 23 miles and the standard deviation 7 miles.

Finally, the scaling of given in the extreme cases of a monocentric city structure and a totally decentralized city structure disagree with the value measured on data (see Table 1). This suggests that most cities have a structure that is neither completely centralized, nor totally decentralized. In particular, this result cast some doubts about the study15 which assumes implicitely that cities are always monocentric. Any situation between the two previous extreme cases would give a scaling of the form where b [1/2, 1]. One can easily see that this expression is consistent with that of A2 and Ltot/P if

which is indeed what we observe empirically (up to error bars). This preliminary analysis thus leads us to the conclusion that, in order to compute the various exponents, we need to better describe the structure of commuting patterns. In other words, we need to find a description of cities that goes beyond the naive monocentric or totally decentralized views and which accounts for the observed sub-linear scaling of the surface area A.

Beyond naive scaling: modeling mobility patterns

We begin with the assumption that mobility patterns are mostly driven by the daily commuting and we would like to understand how an individual, given his household location, will choose his job location. We assume that this choice will be determined by two dominant factors: the expected wage at a given job and the commuting time to this job's location. Indeed, places with high average salaries are attractive, but having to spend a sensible amount of time commuting every day is less desirable. We assume there are Nc potential activity centers in the city, each characterized by an average wage w(j) at location j. This wage is endogenously determined and depends a priori on many factors such as agglomeration effects21, the type of industry, etc. Although it is in principle possible to write down equations to determine the wage (as attempted in11 for instance), not only is it impossible to solve them, but also not necessarily useful. A similar situation arises in physics when one studies the behaviour of atoms made of a large number of electrons. Physicists found out22 that, in fact, a statistical description of these systems relying on random matrices could lead to predictions which agree with experimental results. We would like to import this idea of replacing a complex quantity such as wages –which depends on so many factors and interactions– by a random one in spatial economics. So, we treat the wage as if it was exogenous and random17, that is we write w(j) = s ηj where s represents the typical income in this city and η is a random number chosen uniformly in [0, 1]. Furthermore, we assume that the commuting time does not only depend on the distance between the two places, but also on the traffic Tij between those two locations. An individual living at i will thus commute to the center j which corresponds to the best trade-off between income and commuting time, thus to the center j such that the quantity

is maximum17. The quantity dij is the euclidean distance between i and j (both supposed to be scattered randomly across the city), T(j) the total incoming traffic at j, c the capacity of the underlying transportation network and μ is an exponent describing the sensitivity of the network to congestion. The quantity ℓ is the maximum distance that people can financially travel daily, defined as the ratio between the typical individual income and the transportation costs per unit of distance.

This simple model displays a surprisingly rich behaviour17. In particular, it accounts for the monocentric to polycentric transition observed in most cities. It has been a well-known fact for quite some time that as cities grow, they evolve from a monocentric organisation where all the activities are concentrated in the same geographical area – usually the central business district– to a more distributed, polycentric organisation11,23. Several theories in spatial economics exist1, but are not satisfactory for many reasons. Among other things, they do not take congestion into account and have no predictive, testable content24. Within this framework, congestion is actually responsible for the transition and the number of activity centers in a city of population size P is on average given by


Using data of employment per Zip Code Area in the US17, we showed that

where we measure α = 0.64 ± 0.12 (95% confidence interval (CI)). In other words, the number of centers scales sublinearly with population size.

Computing the exponents


At this stage, the number of centers is a function of population and the area

and we need an additional equation in order to get a closed system. Here we focus on the area and its evolution with the population size, which reflects the growth process of the city. In the following, we will investigate two different approaches. It is worth noting that both approaches give results in qualitative agreement, showing that some stylized facts –such as super- or sublinearity– are very robust.

Fitting procedure. In the absence of knowledge of the processes responsible for urban sprawl, we can assume that the area behaves as

where a is the exponent to be determined, through fits on data. The empirical value for the exponent for the US data is . Once this exponent is given we can then compute the various exponent for the quantities of interest (see the following and table 2). We get for the number of centers k

which is sublinear as long as a < 2, in agreement with the empirical results for US cities. As we will see, this approach yields the same qualitative behaviours as those predicted with the method of the next section. In other words, even if the main mechanism behind urban sprawl is not congestion, the conclusions of this paper are not affected as long as the area scales sublinearly with population.

Table 2 This table displays the predicted theoretical behavior and the empirical observations versus the population size P for different quantities: Ltot is the daily total driven distance, A is the area of the city, LN is the total length of the road network, δτ is the daily total delay due to congestion, Qgas is the yearly total consumption of gasoline and is the total CO2 emissions emitted yearly due to transportation. In the third (fourth) column, we show the predicted values of the exponent of P using the value of α (of a) measured on US data. In the fifth column, we show the value of the exponents directly measured on data about US and OECD cities. The measured values are in good agreement with the prediction. In particular, the exponents for LN and δτ are consistent with our prediction that their difference should be 1/2

Coherent growth. Let us now assume that the scaling of A with population is determined by the number of activity centers and the constant commuting length of individuals. This means that the growth of the area is controlled by the appearance of new activity centers. if we assume that a city is organized around k activity centers and that the attraction basin of each of these centers are spatially separated17, we then have A ~ kA1 where A1 is the area of each subcenter's attraction basin. This area A1 is related to the average individual commuting distance by and we obtain

This leads to expression for the number of centers

which is always smaller than 1, also in agreement with the empirical results for US cities. We can now also compute the scaling of the surface area

We further assume that Ltot/P is a fraction of the longest possible journey ℓ individuals can afford, that is to say

It is important to note that if ℓc is independent from ℓ, the quantitative predictions of our model would still hold. The final expression for the area is then here given by

where . The exponent δ is smaller than 1/2 whatever μ ≥ 0, which implies that the density of cities increases sublinearly with population. In other words, the density of cities increases with population. We verify this prediction in Table 2, with data about land area of urbanized areas in the US (Figure 2). We find 2δemp = 0.85 ± 0.01 (95% CI) which is not too far from the theoretical value 2δth = 0.64 ± 0.12 (95% CI), equal to α in this case.

Figure 2
figure 2

Mobility and city structure and their impact on agglomeration economies and diseconomies.

(a) Variation of the daily total driven distance with the population for 441 urbanized areas in the US in 2010. The dashed line shows the power-law fit with exponent 0.595 ± 0.026 (r2 = 0.90). (b) Variation of the land area with population for 3540 urbanised areas in the US in 2010. The fit assuming a power-law dependence gives an exponent 0.853 ± 0.11 (r2 = 0.93). Both exponents are smaller than 1, as predicted by our theory. (c) Variation of the total lane miles with population for 363 urbanised areas in the US. A power law fit (dashed line) gives LN/ℓ = P0.765±0.033 (r2 = 0.92). The sublinear behaviour –which agrees with our prediction–means that larger cities need to spend less in infrastructure per capita than smaller ones. (d) Variation of the total delay due to congestion with population for 97 urbanised areas in the US. A power law fit gives an exponent 1.270 ± 0.067 (r2 = 0.97). The superlinear behaviour agrees with the prediction given by our model and challenges the claims of sustainability of cities.

Because the area of an urban system results from centuries of evolution, we do not a priori expect our model–where individual vehicles are assumed to be the only vector of mobility– to give a prediction valid for all countries and all times. Nevertheless, these results give us reasons to believe that the spatial structure of the journey-to-work commuting should still be the dominant factor in the dependence of land area on population.

Total commuting distance

Using Eq. 6 and Eq. 22 we are now able to compute

We plot for urbanized areas in the US on Figure 2 and one can check in Table 2 that the exponent predicted from the previously measured value of α agrees well with the exponent measured on the data. In the fitting case, the exponent would simply be given by 1 − a/2.

Total length of roads

If we use the previously derived expression for the area A, we find

The quantity δ is less than 1/2, which implies that LN scales sublinearly with the city's population size. In other words, larger cities need less roads per capita than smaller ones: we recover the fact that agglomeration of people in urban centers involves economies of scale for infrastructures. Within the fitting assumption (Eq.16), we would obtain (1 + a)/2.

Total delay due to congestion

Unfortunately, agglomeration in cities does not only generate economies. Congestion, for instance, is a major diseconomy associated with the concentration of people in a given area. A simple way to quantify the impairement caused by traffic congestion is through the total delay it generates. If we make the first order approximation that the average free-flow speed v is the same for everyone, the total delay due to congestion is given –according to our model–by

If we assume that all the centers share the same number of commuters –a reasonable assumption within our model17–we obtain

which, using the expressions for Ltot and A given in Eq. 23 and Eq. 22 respectively, gives

The total commuting time corresponding to the same distance but without congestion scales as τ0 ~ Ltot and thus less rapidly than the total delay which scales super-linearly with population (even when polycentricity is taken into account). This means that, for the largest cities, delays due to congestion actually dominate the time spent in traffic and that economical losses per capita due to the time lost in congestion –and the corresponding strain on people's life– increase with the size of the city.

In the fitting assumption Eq. 16 and using the same arguments for the calculation of δτ, we easily obtain for the exponent the value .

Transport related CO2 emission. Gasoline consumption

Another diseconomy associated with congestion is the quantity of CO2 emitted by cars and the gasoline consumed by motor vehicles. This amount not only depends on the distance that has been driven, but also on the traffic during the journey. It indeed turns out that for the same length driven, a car burns more oil when the traffic is heavy than when the road is clear. Within our model, the presence of traffic is seen in the time spent to cover a given distance and we write that the quantity of CO2 emitted by a vehicle is proportional to the total time spent in traffic, leading to

where q is the average quantity of CO2 produced per unit time. In the polycentric case with k = k(P) subcenters, the typical trip length is given by and we obtain

The first term in brackets is a constant and the quantity of CO2 is thus dominated by congestion effects at large populations

and the total daily transport-related CO2 emission per capita thus scales as

The quantity of CO2 emitted per capita in cities thus increases with the size of the city, a consequence of congestion. This prediction agrees with the exponent we measure (Figure 3) on data gathered for US and OECD cities (Data about the area and population of urbanised areas can be found on the Census Bureau website38, data about congestions in urban areas can be found in the Urban Mobility Report39 and data about the total lane miles and the daily total miles driven in urbanized areas can be found on on the Federal Highway administration website40). We are aware that the scaling of CO2 with population size is controversial, with results varying from one study to another. Although a systematic meta-analysis of these results is beyond the scope of this paper, we note that the authors of27 are concerned with the total emissions of CO2, while this paper is only concerned with emissions due to transportations. Moreover, our prediction agrees well with the exponent of 1.33 measured by the authors of26 on the same dataset, but with a different definition of cities. Finally, our prediction also agrees with measurements made in28 for developing countries.

Figure 3
figure 3

Variation of CO2 emissions due to transport with city size.

In blue, excess CO2 (in tons) due to congestion, as given by the Urban Mobility Report (2010) for 101 metropolitan areas in the US39. In green, we show the estimated CO2 emissions (in tons) due to transports, as given by the OECD for 268 metropolitan areas in 28 different countries (Data about the total CO2 emissions due to transportation in major metropolitan area in the OECD can be found online41). The dashed yellow lines represent the least-square fit assuming a power-law dependency with multiplicative noise, which gives respectively for the US data and for the OECD data.

Another important related quantity is the the consumption of gasoline which in principle is proportional to the emission of CO2 and the time spent driving. The total daily gasoline consumption is then given by

where q is the average quantity of gasoline needed per unit time. From this expression, we see that the total daily gasoline consumption per capita scales as

and is therefore not a simple function of the city density, in contrast with what was suggested by the seminal paper of Newman and Kenworthy4. At this stage however, more data about gasoline consumption is needed to test this prediction and draw definitive conclusions.


Monocentric versus polycentric

Although polycentricity emerges naturally from our model as a result of congestion, many circumstances can prevent or foster the appearance of new activity centers in a city. There are many debates as to whether policies should favour polycentric or monocentric developpement of cities. Most of them are based on ideologies and opinions about how cities should be, very few are based on a quantitative understanding of the city as a complex system. Although this only represents a small part of the debate, our model allows to quantify the effect of polycentricity on the total delay due to congestion.

We can indeed compute the total delay due to congestion in the case of a monocentric configuration. In this situation, all the population commutes to a single destination 1 and we have

It follows, using the expression given above for Ltot

From the fact that , we indeed find that the total delay due to congestion is worse for monocentric cities than it is for polycentric cities with the same population, which agrees with the usual intuition. More precisely the ratio of delays is given by

where the exponent is of order β ≈ 0.57. Therefore, even though diseconomies associated with polycentric cities scale superlinearly with population, it would be even worse if we did not let cities evolve from the monocentric case. The same reasoning applies to the consumption of gasoline and the CO2 emissions. This suggests that, everything else being equal, polycentricity should be favoured for quality of life and environmental reasons.

Megapolis versus urban villages

Also, given the superlinear behaviour of the diseconomies associated with living in cities, it is clear that we would be better off living in several smaller cities rather than a single huge city. However, due to the economies of scale realised in large cities, we can wonder whether this is also economically reasonable. If we assume that the total cost of a city of population P is the sum of its infrastructure cost and the economical losses due to congestion we have

where is the average cost of a kilometer of roads, the average hourly wage and Δt the planning horizon in years (this expression is not exhaustive, as the costs dues to CO2 emissions and gasoline consumptions are not included). The infrastructure needs maintenance and its cost depends on the planning horizon as well and can be written where is the construction cost in $/km and the maintenance cost in $/km/year.

We assume that the population P is distributed among n cities of the same size P/n (see Figure 4). The total lane miles for the n cities reads where is the total lane for one city. The total congestion delay for n cities is δτn = nδτ(P/n) and we thus obtain the total cost CT(P, n) for n cities

The number of cities nmin which minimises the total cost is obtained when , leading to (for )

(the actual number of cities is of course an integer and can be taken as the nearest integer from nmin for instance). It is then economically advantageous to divide the population in several cities if nmin > 2. To illustrate this point, we compute the number of cities which would minimise the cost for a world population P ≈ 109. The World Bank estimates the maintenance cost of roads to be of the order of and the average hourly wage to be of the order of , the value of δ is taken from the measures on US data, δ ≈ 0.27 and τ/ℓ ≈ 10 km/h. We then obtain

which gives an average city size of P/n ≈ 5, 500, 000. This result is to put in perspective with the fact that the world hosts 40 or so cities with over 5, 500, 000 inhabitants and that this number is still increasing.

Figure 4
figure 4

Scaling down.

We consider a population P and see how indicators change when we compare it with a system with many cities and the same total population. (a) Variation of the yearly delay per capita due to congestion with the number of cities (normalised by the value δτ(1) corresponding to the single city case). (b) Evolution of the infrastructure length with the number of cities (normalised by the value LN(1) corresponding to the single city case). Relative gains in terms of commuting time per person decrease faster than infrastructure costs increase, suggesting that life in cities could be improved at a relatively low cost by decentralisation.

The most economical population distribution

The previous results assume that we split a large city into many cities of the same size. The cities are however organized in various sizes distributed according to something that can be approximated by a Pareto distribution, as known since Zipf's work29. It is still unclear why we observe such a convergence30,31. We propose here a new perspective to this debate by asking: Assuming cities are distributed according to a Pareto distribution, what value of the exponent minimises the overall cost? Indeed from above the total cost for a population size x is given at large times by

We assume that the population is distributed according to

with γ > 1 and a cut-off population (which is at most equal to the world's population). The average cost is then given by

leading to

The only consistent solution is obtained for γ < δ + 2. The dominant term for is given by

The optimal power law distribution minimizes the average cost and is such that . We obtain the following equation

and in the limit we obtain the optimal value for γ

Numerically, δ ≈ 0.32 and Λ ≈ 109, leading to γ* ≈ 2.27. It is interesting to note that this value would lead to a rank-plot exponent (≈0.78) not far from those measured on different countries around the world32. Although we do not pretend that the above reasoning provides a definitive answer to the Zipf puzzle, it nevertheless suggests that the broad diversity of population might derive from economical considerations and that there may be a connection between the Zipf law exponent and optimality considerations.


The superlinear increase of congestion delay with population and thereby of gasoline consumption and of CO2 emissions, has terrible consequences on the economy, the environment, health and well-being. The outlook is nothing short of grim in our ever-urbanising world. As the proportion of human beings living in cities dramatically increases –the UN expects the world population to be 67% urban in 205033– wages are likely to increase7 but not enough to compensate for the negative effects of congestion. As a result, if the individual car stays the dominant transportation mode, cities will put more strain on people's life, while acting as catalysts for the production of CO2 greenhouse gas, responsible for an overall increase of the planet's temperature34. It is currently believed that advantages associated with living in a large city outweigh the costs. Our results reveal however the existence of very rapidly growing problems such as congestion and CO2 emissions, which inevitably begs the question of the sustainability of large cities. It might be time to cut down considerably the use of individual vehicles, or to consider the possibility of living in smaller or medium sized cities: the infrastructure costs (LN) may be larger, but the impact on the environment (CO2 emissions) and on the well-being of people (delays in congestion) would be beneficial (see Figure 3).

The most striking fact about the above results is that despite the apparence of complexity that is conveyed by cities, most of their structure can be explained by the very simple and universal desire for the best achievable balance between income and commuting costs. Our model unifies mobility patterns, spatial structure of cities and allometric scalings in a framework that can be built upon. More work is needed in order to integrate information about firm locations, the influence of public transportation on mobility patterns35, the effect of the integration of cities into urban systems36, to understand the fluctuations around the average trends and to test the validity of the model on different sets of data. We believe however that the results presented here represent a crucial step towards a scientific understanding of cities.



As recently stressed in37, when trying to identify patterns accross cities, one must be careful and consistent in the definition of city boundaries. These authors indeed found out that the scaling exponents measures for several quantities are usually sensitive to the definition chosen for the city. In order to make our results reproducible, we detail in the following the data source and the corresponding city definition.

Total distance driven and lane miles

The daily commuting vehicle-miles as well as the total lane miles data were obtained for the year 2011 from the Federal Highway Administration for 441 Urban Areas (as defined by the Census Bureau) in the US.


The surface area data were obtained for the year 2010 from the Census Bureau for 3540 Urban Areas (as defined by the Census Bureau) in the US. It is interesting to note that the dependence of the surface area of Metropolitan Statistical Areas with population is a lot less clear-cut, implying that, with respect to surface area, the definition of UAs delineates more coherent systems than the definition of MSAs.

Values of ℓ

In order to compute a value for , we use for s the average wage at the county level, provided by the Bureau of Labor Statistics. For t, the transportation cost per unit distance, we use the average gas price per state as given by the U.S. Energy Information Administration and assume that all vehicles burn the same quantity of gas per unit distance on average. Interestingly, while we have assumed a constant ℓ throughout this paper, we have noticed that its effect on the different scalings was not negligible (Compare the results for LN and A between Table 1 and Table 2 for instance), implying that ℓ has a small, yet non-zero dependence on the population. This probably comes from the dependence of the average wage on population7. We leave the investigation of this dependence for further studies.

Total delay and CO2 emissions

The excess CO2 and the total delay due to traffic congestion were obtained for the year 2012 from the Urban Mobility Report for 97 Urban Areas in the US. Also, the quantity of CO2 emissions due to transportation was obtained from the OECD for 268 metropolitan areas accross 28 countries for the year 2008. It is worth noting here that the US definition of Urban Area and the OECD definition of Metropolitan Area are qualitatively different, added to the fact that OECD data cover many different countries. Yet, the measured values of the exponent are compatible with each other.

As far as the United States are concerned, we present results for Urban Areas only. Indeed, when data were available for both MSA and Urban Area, we found out that the MSA data did not exhibit as clear-cut regularities as the Urban Area data did. We believe that this effect is due to the lack of a unique, quantitative definition of a city which makes In this work, we assumed that Urban Areas designate areas which are coherent with respect to the quantities we are measuring and leave the crucial issue of city definition for further studies.