Morphology of travel routes and the organization of cities

The city is a complex system that evolves through its inherent social and economic interactions. Mediating the movements of people and resources, urban street networks offer a spatial footprint of these activities. Of particular interest is the interplay between street structure and its functional usage. Here, we study the shape of 472,040 spatiotemporally optimized travel routes in the 92 most populated cities in the world, finding that their collective morphology exhibits a directional bias influenced by the attractive (or repulsive) forces resulting from congestion, accessibility, and travel demand. To capture this, we develop a simple geometric measure, inness, that maps this force field. In particular, cities with common inness patterns cluster together in groups that are correlated with their putative stage of urban development as measured by a series of socio-economic and infrastructural indicators, suggesting a strong connection between urban development, increasing physical connectivity, and diversity of road hierarchies.


Data description
For each of the 92 cities, the maximum total number of unique driving routes is 630. However, after filtering out those discrepant routes and OD pairs, for each of the radii values, the average number of valid routes we analyzed were 575.9 (2km), 532.8 (5km), 461.7 (10km), 391.0 (15km), 349.3 (20km) and 254.7 (30km). Supplementary Table 1 also describes the detour index (DI) (i.e., the ratio d/r between the travel distance d and geodesic distance r) for fastest (F) and shortest (S). Avg. waypoint represents the average number of route points as returned by the OSM API for each city.

Routes as street samples
If we consider only the subset of the streets within the 30km radius of our analyses, on average, the shortest and fastest routes covered approximately 24% and 18 % of the overall streets networks, respectively (Supplementary Figure 2a). However, when we talk about different routing approaches, faster arterial roads and minor residential streets are going to respond for different aspects of the route optimization, and therefore they are expected to cover different samples of a road network. The distribution of road types being sampled by each routing method is depicted in the Supplementary Figure 2 b&c. As explained in the main text, arterial roads such as motorway and trunk roads are much more relevant for fastest routes than for shortest routes, reflecting in how frequently they appear in each type of route. Supplementary Table 2 shows the fraction of the overall streets networks being sampled by the shortest and fastest S-3 routes for each city as well as the ratio between the two fractions. Supplementary Figure 3 & 4 depict the participation of the most frequent road types for each city. Supplementary

Supplementary Note 2 Comparison with network centrality measures
In this section, we compare the spatial distribution of inness with different centrality measures of relevance in the context of transport networks. The objective here is to verify to what extent inness can deliver additional relevant information for which traditional measures of centrality do not account. Without loss of generality, one can say that the inness of a certain point in the road network of a city is the result of the aggregation of the characteristics of the routes that potentially pass through that point. Indeed, it captures geometric aspects of the network since it is a measure based on the curvature of the roads along a route. It also reflects the structure of the network since it is a metric influenced by topology and network connectivity. Moreover, it also reflects the long-range topographical and geometric relationships since it is not a local measure of centrality but is actually a property of the route. Thus, if a certain point participates in routes with both positive and negative inness, and of equal magnitude, that point is expected to exhibit a null inness. On the other hand, if the inness of the routes crossing a given point is predominantly positive (or negative), we can safely say that that particular point is part of a functional sub-structure of the network with specific geometric characteristics.
We compared the inness with network centrality measures capable of reflecting different facets of the structures, such as topology, locality and geometry. The centrality measures of choice were: Closeness is a measure of centrality that determines how close a particular node in the network is to all other nodes. Clearly, in the context of spatial networks in a finite area, the nodes with the greatest closeness are those close to the centroid of the network. In the context of spatial networks, closeness is essentially a geometric metric as the actual topology of the network has no importance in determining the centrality of a node.
Eccentricity -the eccentricity of a given node is the largest geodesic distance between that node and any other node in the network. Contrasting with the closeness centrality, eccentricity, by definition, does not reflect the geometric characteristics of the network being mostly a topological metric.
Degree centrality is a measure of local connectivity of a node in which the importance of a node is determined by the number of nodes directly connected to it. It is primarily a measure of the local connectivity of the nodes, having no long-range correlations.
Betweenness is a measure that ascribes importance to those nodes lying along the shortest paths between other pairs of nodes such that the more the shortest paths crossing a node, the higher its betweenness centrality will be. It is probably one of the most investigated centrality measures, especially in the context of road networks.
In Supplementary Figure 1 the inness profile, along with the four centrality measures for three large European cities. In each panel, values above the average are colored in red, and below that, in blue. The first striking feature we can observe is that the inness profile is very distinct from the ones produced by the centrality measures. For instance, none of the centrality measures managed to capture the concentric circumferential patterns produced by the ring-like structures in the network, manifested as low inness zones.
Another interesting pattern we can observe is the region of extreme inness profiles, suggesting the presence of large detour spots such that routes traveling through those regions have no option but to drive inward. For example, a closer look at the East of London, where routes across the River Thames are being pushed away (low inness in blue) or pulled towards the city center (high inness in red), is a direct consequence of the absence of bridges in that area. Such functional insights from the systems are not possible from the observation of the standard centrality measures in isolation.

Supplementary Note 3 Types of roads
We summarize the statistics of road types for our sample cities. We also compare the routes samples with the complete road network within the same boundary we used for the inness calculation. The complete road network data was collected from the OpenStreetMap repository using a service 1 . In Supplementary Figure 2 we show the distribution of the road types in our routes data for each. See http://wiki.openstreetmap.org/wiki/Key:highway#Values for detailed information about the road type labels and their meanings.

Supplementary Note 4 Metric for road structure
We suggest three metrics to measure various facet of infrastructural and geographical features. We use same road networks data used and explained in Supplementary Note 3 Road length is the total length of motorways, trunks, secondary, primary and tertiary roads for each city.
Level of geographical constraints (GC) represents the overall fraction of the city that is not covered by the road network due to the presence of barriers. Here, we define a barrier as an area of the city that is unaccessible via public roads, being either natural (e.g., forests and mountains) or artificial (e.g., a large industrial or military site). To calculate the GC we generated 10,000 uniformly distributed points within the same area of our study and computed its relative distance to the nearest point of the road network. More precisely, GC can be defined as where rd is the distance of the point to the closest street segment and rc is the distance to the city center from the random point. For instance, if many random points in a particular area are closer to the center than to a road, GC becomes bigger, suggesting, therefore, the presence of a large barrier close to the center. In the Supplementary Figure 8a, the point A is inside the urban area and have road segments nearby while the point B is located in a mountainous area. Although the two points have similar rc (the blue dashed line), B has higher rd (the red dashed line) than A and consequently the GC of B is higher than that of A. The term rc accounts for barriers near the city center having a greater impact to routes than barriers of similar area in the periphery. In Supplementary Figure 9 we show two examples of representative cities with very different GC profiles.
As one can see, London has almost no regions of poor connectivity caused by geographical constraints. Mumbai, on the other hand, has many regions of little to no connectivity due to the presence of geographical constraints such as the large Sanjay Gandhi National Park (the brighter spot in the north-northeast region), and the Thane Creek, the inlet that isolates the city of Mumbai from the Indian mainland.
Peripheral connectivity represents the average value of all the acute angles of the higher-level peripheral roads, or more precisely, the motorways and trunks beyond a minimum distance from the center, in this case, 10km. These parameter choices are motivated by the reasonable assumption that a road segment with a high angle (or close to 90 degrees) is likely to be part of ring-like structure, which is presumably used more for connecting peripheries than spoke-like roads. The greater the average angle, the more likely it is that the peripheries are connected, thus acting as a proxy for the presence of circumferential roads. To calculate the angle between the center and a road segment, we draw the shortest line between the center and the middle point of a road segment and measure the angle between the line and the road segment, as depicted in Supplementary Figure 8b Supplementary Figure 9: Examples of the spatial distribution of the geographical constraints (GC) The values of geographical constraints (GC) for 10,000 random points are spatially mapped on two sample cities; a London and b Mumbai. The same color scheme is applied to both cities with a range from 0 to 0.1. Note that London is one of the cities with low average and standard deviation inness (i.e., LL group) whereas Mumbai is a low average and high standard deviation inness city (i.e., LH group).

S-15
Supplementary Figure 10: Spatial distribution of cities for each group. Examples of cities of the types discussed in Fig. 4 of the main manuscript. The group LL, LH and HH are classified according to the standard deviation and average values of the inness (LL Group: Low standard deviation and low average (close to zero); LH Group: high standard deviation and low average; HH Group: High standard deviation and high average).

Supplementary Note 5 Outlier cities
Some cities such as Quanzhou, Dongguan, Qingdao, Kinshasa, Harbin, Surat and Kabul exhibit extremely high standard deviation in comparison with other cities (See SI, Section Fig for details on the outliers). For Quanzhou, Dongguan and Qingdao which have relatively low average inness, most part of these cities are shaped by geographical constraints such as the closeness to the coast or being along the path of a river. Just like the geographical constraints influence the shape of the routes of cities in the third category, similar barriers strongly affect these outlier cities. For instance, Kinshasa, Harbin, Surat and Kabul basically belong to the second category, i.e., a "hub and spoke" structure with strong positive inness signal. However these cities also have negative inness values due to lack of infrastructure (e.g., bridges) connecting different parts of the city across the rivers.

Supplementary Note 6 The correlation between shortest and fastest routes
When we take into account the inness of the fastest routes, we are indirectly incorporating the influence of second-order structural features of the network such as roadway capacity and speed limits. The analysis of the faster routes, therefore, offers an additional perspective and to a certain point closer to the real operation of that structure. However, a decidedly more elaborate picture about the structure of the road network can be obtained by means of a quantitative characterization of the geometric similarities and, above all, of the discrepancies between the shorter and faster routes. This is because it is only through this comparison that we can verify where and with what magnitude the influence of the path capacity in the geometry of the routes occurs. For example, if for a given pair of origin and destination the shortest and fastest route have discrete inness profiles, this difference is only possible because the segments along the faster route are potentially more temporally efficient.
The correlation between the inness of the shortest and the fastest routes is a measure capable of revealing this difference between the two route types. In fact, those urban systems where the shortest and the fastest routes have little difference are those where any increases in distance are not offset by gains in terms of travel time. From a purely structural perspective this could be said to be an efficient road network such that the fastest routes are also the shorter routes.
Our hypothesis, however, is that it can occur for two main reasons: (1) due to greater homogeneity in terms of road capacity and/or (2) due to low road network capillarity. Therefore, we used three different correlation measures to classify cities according to their similarity profiles between the shortest and fastest routes.

Classifying cities based on their Inness profiles
Although we are not claiming that the cities can be naturally classified into different discrete groups, here we show that the correlation between the inness of shortest (I s ) and fastest (I f ) routes can be used as a metric to classify cities. Thus, we computed three correlation measures, namely Pearson correlation, Spearman's rank order correlation and Kendall rank correlation. The measures were computed comparing the inness of the average shortest and fastest routes for each radius/angle value.
For each of the N ≤ 36 routes with radius r and angular distance θ we computed the average I S and I F , with the inequality being due to the existence of unfeasible paths for certain OD pairs, and measured the correlation coefficients between the two inness arrays. The rationale to use three correlation metrics is that this way we can characterize the said dissimilarities in a higher dimensionality space, accounting not only for the absolute values of the inness but also for the ranks of the (r, θ) pairs in terms of their inness .
We then applied a hierarchical clustering method to produce a partitioning of the cities based on their similarities in terms of their fastest and shortest inness profiles. The method is a standard complete linkage clustering method in which the maximum possible distance between points belonging to different groups is sought. Supplementary Figure 12 shows the dendrogram of the partitioning. Next we computed the within-clusters sum of squared deviations (WCSS) to quantify how much of the variance could be explained by partitioning the cities intok clusters. Clearly, a perfect partitioning would be one in which each cluster contains one single city. Supplementary Figure 13 shows the WCSS as we increase the number of clusters. As we can see, most of the variance can be explained by three clusters and only very little variance is explained by increasing the number of clusters from 4 to 5, suggesting that the best partition would be one with k = 3 or k = 4. Bellow we show one partitioning obtained from the Pearson correlation coefficient ρ.  Figure 13: Within-clusters sum of squared deviations (WCSS) as a function of the number of clusters k Most of the variance can be explained by three clusters and only very little variance is explained by increasing the number of clusters from 4 to 5, suggesting that the best partition would be one with k = 3 or k = 4.

Socio-economic indicators
As we presented in the main manuscript, the (dis)similarities between the inness profiles produced by the shortest and fastest routes are often rooted on the level of development of the road infrastructure, which in turn is driven by the socio-economic development of the cities. We then explored the correlation between the I s and I f with three relevant indicators that could reflect the said stages of developments, namely the productivity index (PI), the infrastructure development index (IDI) and the GDP per capita of the cities. The first two indexes (PI and IDI) are part of the City Prosperity Index, to date, the most comprehensive measure of the development of a city, developed by the United Nations program for human settlements (UN-Habitat). Each one of the six CPI indexes (including the PI and IDI) is defined in terms of an array of other indicators such as household income, economic specialization and housing infrastructure. The decision to employ the PI and IDI is motivated by the fact that these are the indexes more closely related to the structural development of the cities than other ones. For more details on the CPI indexes we refer the interested reader to the UN-Habitat Methodology and Metadata report 2 .
The third indicator we used, i.e., the GDP per-capita of the cities, is based on the GDP@Risk estimate, a projected GDP of the cities based on the World City Risk Index -a risk-assessment metric developed by the Cambridge Centre for Risk Studies and published on the Lloyd's City Risk Index. More precisely, the index is a projection from 2015-2025 of the GDP accounting for different risk factors for the 301 world's major cities. More detailed information on the methodology can be found in the report 'World City Risk 2025: Part 2 Methodology' 3 .
The decision to use a projected GDP -instead of the official estimated nominal GDPs officially published by the governments -is justified by three main reasons. The nominal GDP of a city is subject to some volatility due to many internal and external factors, contrasting with the transportation infrastructure of a city that tend to evolve over longer periods of time. Moreover, there are a lot of methodological variation in the way the nominal GDPs are estimated, especially for non-OECD cities. Additionally, the most recent data of the official GDPs does not necessarily correspond to the same period for different cities.
On the other hand, the projected GDP of the cities is a standardized metric based on the same scientific methodology for all the cities accounting for many factors of internal and external origin, from present infrastructure to potential natural disasters. Moreover, the GDP projection can reflect with a reasonable precision the potentialities of growth for a city, in which the level of development of the infrastructure plays a major role.

New York
Contrasting with other large developed urban cities (type I), New York exhibited similar inness pattern between the shortest routes and fastest routes. The reason for such phenomena can be related to the geography of the motorways in the New York metropolitan area. Unlike other cities, New York does not have strong ring-like motorway structure in its periphery, which often are the preferred structures when it comes to congestion reduction and travel-times optimization. Instead, it has many radial and grid-like motorways, which has a limited effect on the inness patterns, as shown in the spatial distribution of fastest routes. Such particularity of the motorways of New York gives it unique inness characteristics, although further investigations regarding other factors (e.g., socioeconomic characteristics) is necessary.