Uncovering the socioeconomic facets of human mobility

Given the rapid recent trend of urbanization, a better understanding of how urban infrastructure mediates socioeconomic interactions and economic systems is of vital importance. While the accessibility of location-enabled devices as well as large-scale datasets of human activities, has fueled significant advances in our understanding, there is little agreement on the linkage between socioeconomic status and its influence on movement patterns, in particular, the role of inequality. Here, we analyze a heavily aggregated and anonymized summary of global mobility and investigate the relationships between socioeconomic status and mobility across a hundred cities in the US and Brazil. We uncover two types of relationships, finding either a clear connection or little-to-no interdependencies. The former tend to be characterized by low levels of public transportation usage, inequitable access to basic amenities and services, and segregated clusters of communities in terms of income, with the latter class showing the opposite trends. Our findings provide useful lessons in designing urban habitats that serve the larger interests of all inhabitants irrespective of their economic status.

: Box-plots of mobility outflows for the top and bottom 20% of zip codes in terms of median income, for the US (top) and Brazil (bottom.) The data is binned by population, and the boxes indicate quartiles of the distributions. We see that in all four cases, population is positively correlated with mobility (Pearson correlations: US < 20% : .85, > 80% : .73; BR < 20% : .81, > 80% : .67) indicating that mobility is representative of population regardless of income.

S4 Cluster analysis
The clustering of cities was done using a divisive hierarchical method to split the cities into disjoint groups of urban areas with similar patterns in their mobility/socioeconomic correlations. To determine the ideal number of partitions we use a silhouette analysis. The silhouette value is a measure of how similar a data point is to the cluster it has been assigned to in comparison to its average distance to another candidate cluster. More precisely, the silhouette score s(i) of a data point i with regards do a partitioning structure C can be defined as is the mean distance d(i, j) between the point i and all other points (j) in the same cluster C i and where b(i) is the minimum average distance of the point i to the points of one of the other clusters. Here, b(i) is the distance of i to the cluster it is the most similar with, other than the one it has been assigned to. Therefore, s(i) ∈ [−1, 1] such that if the silhouette score s(i) > 0 that data point is in the best cluster it could belong to. Conversely, if s(i) < 0 the point i is not assigned to its most natural cluster, suggesting that the partitioning structure is not ideal. The ideal partitioning structure is expected to be the one that produces only positive silhouette scores at the same time that maximizes the average silhouette score. Figure S4 shows the average silhouette scores for k = 2 to k = 10. For both countries k = 2 produces the maximum average silhouette score.   Figure S7: Reproduction of core findings using LODES commuting dataset. A The identified city clusters are similar (Fowlkes-Mallows score = 0.8), though not identical, to the clusters shown in Fig. 4. B The trends in the correlation between income and average flow distance are similar to, but less exaggerated than, those found when using location data. The correlation with out-strength per capita however is quite different. C The difference in mass transit usage between clusters D Finally while we see limited differences in amenity distance between clusters at most income levels, we find that in the teal cluster, high-income individuals live on average further from basic amenities, similar to the trend seen in Fig. 4. Figure S8: A. Dendrogram obtained from socio-mobility clustering analysis in Brazilian cities. Cluster 1 (blue) contains almost all the largest metropolitan areas and the capital cities of their respective states, with the exceptions of Lagos and São José dos Campos whereas among those in Cluster 2 (yellow) the only state capital is Florianópolis. B Spearman correlation between the number residents of each income bracket and the average flow distance and out-strength per capita. All cities show a high correlation, and the only difference between them is the magnitude of such correlation and how it affects the middle class. C Fraction of mass transit usage for each of the clusters. D Fraction of basic amenities as a function of distance for 5 income levels.

S6 Amenity diversity
The amenity diversity was defined as the Shannon entropy, computed from the relative frequencies of the amenities' distribution. In our analyses we considered a set eight major categories of amenities, namely: A ={art-culture,education,entertainment, finance, healthcare, sustenance, transportation, other}. For each spatial location unit l we computed the entropy where p i is the fraction of amenities in the area such that p i = n i N with N being the total number of amenities in the area and n i the number of amenities of type i.

S7 Spatial autocorrelation of areas according to income
The spatial autocorrelation of median incomes across zip codes, is calculated as: where N is the total number of zip codes i in a city, x i is the median income of zip code i,x is the mean of all median incomes in the city, w ij = 1 if zip code j is touching i and 0 otherwise, and W is the sum of all w ij .
The metric measures how alike zip codes are to their direct neighbors within a city by comparing them to the city-average. I is near −1 for a totally mixed distribution of zip code incomes (zip codes are unlike their neighbors relative to the average, leading to negative numerator contributions), 0 for a random arrangement, and 1 if high and low income zip-codes are completely separated (zip codes are completely alike their neighbors, leading to positive numerator contributions.) We can interpret the autocorrelation of median income as a measure of the degree to which incomes are spatially mixed within a city. Higher values indicate that zip codes of a particular income tend to be grouped together (as in the presence of "downtown areas",) whereas lower values indicate a more homogeneous distribution throughout a city. S-12