A generalized vector-field framework for mobility

Trip flow between areas is a fundamental metric for human mobility research. Given its identification with travel demand and its relevance for transportation and urban planning, many models have been developed for its estimation. These models focus on flow intensity, disregarding the information provided by the local mobility orientation. A field-theoretic approach can overcome this issue and handling both intensity and direction at once. Here we propose a general vector-field representation starting from individuals' trajectories valid for any type of mobility. By introducing four models of spatial exploration, we show how individuals' elections determine the mesoscopic properties of the mobility field. Distance optimization in long displacements and random-like local exploration are necessary to reproduce empirical field features observed in Chinese logistic data and in New York City Foursquare check-ins. Our framework is an essential tool to capture hidden symmetries in mesoscopic urban mobility, it establishes a benchmark to test the validity of mobility models and opens the doors to the use of field theory in a wide spectrum of applications.

Early research works mainly used data from transportation surveys and census to analyze people travel and activity patterns [8,33].The collection of such mobility data can be expensive and time-consuming, and it is hard to acquire a sufficient sample size [34,35].With the development of information and communication technologies (ICTs), the availability of large-scale high-resolution mobility data, such as call detailed records, GPS-located taxi data and online social network data, has notably increased.This enabled researchers to quantitatively study various mobility contexts and put forward new models to analyze the underlying mechanism of mobility [8,[36][37][38][39][40][41][42].
In terms of models, mobility can be studied at two different levels: individual trajectories and aggregated flows between areas.Individual mobility models usually focused on characterizing the behavior of individuals in the process of selection of destinations, adding a certain degree of stochasticity to account for people heterogeneity and free will (see [36,[43][44][45] for some examples and [8] for a recent review).At the aggregated level, the earliest models fall into two families: the gravity [13,46] and the intervening opportunity model [14,47], which has later evolved into the so-called radiation model [48].These models and their updated versions predict flows between locations, describing travel distance distribution and defining locations attractiveness [48][49][50][51][52][53].However, they are not designed to capture the local flow orientation, which is a spatial information that plays a significant role on describing mesoscopic mobility patterns [54,55].
Such combination of mobility flow intensity and orientation can be studied using a field theoretical framework.The idea of using fields and potentials for studying mobility emerged in the context of the gravity model, where these concepts appear in a natural way [56,57].The lack of data prevented further advances on this direction, until a recent work [58] proved that a vector field framework could be used to characterize trips between home and work (commuting) in a number of cities in the world.Not only that, this framework was able to solve a controversy almost 80 years old on which of the two families of models performs best to describe commuting.The gravity model produces results that matches empirical commuting mobility patterns both in intensity and orientation of the flows.A field representation has lately been used in a machine learning context, where knowing the potential can significantly improve the performance of the model to predict traffic flows [59].Later, some works translated the field approach to lower scales of mobility (single flows) [55,60,61], pedestrian route selection [54] or the mobility associated to the celebration of special events [62].Nevertheless, it is not yet clear how the definition of mesoscopic mobility fields can be extended to any type of mobility starting from individual trajectories and what are the features that may permeate from the microscopic mobility information to the mesoscopic scale.
In this work, we introduce a definition for the vectorial framework of mobility encoded in individual trajectories valid for all mobility types.We investigate how individuals' choices determine the features of the mobility vector field.We find that empirical trajectories extracted from logistic motivated trips in Chinese cities and Foursquare check-ins in New York City lead, following our definition, to well-behaved vector-fields, which satisfy the di- 1. Definition of trajectories orientation and resulting vector field.The large gray circle stands for an idealized city (with the black point as the city center), the green numbered circles are the stops sequence of each trajectory (1-2-3-4-1, 5-6-4-5 and 5-7-5), while the square in 1 and the hexagon in 5 are the trajectories' origins.a City center as RP.Vectors of each color connect consecutive stops of a trajectory, for example, the trajectory 1-2-3-4-1.When the vector connecting X to Y forms an acute angle with the position vector #« X starting in the city center, we mark the vector # « XY as positive, meaning that the agent is moving away from the city center, like the vector # « 57.Vice versa, we mark the vector as negative when the agent moves towards the city center, e.g. the vector # « 75.b Trajectory-origin as RP.Identically to what we defined for the city center but we use the origins of trajectories 1 or 5, respectively, as RP for the position vectors.c Sketch of the method to build the vector field.The space is divided in a grid, vectors departing from stops in a cell i are normalized and summed vectorially to produce #« Ti.Then we define the mobility field # « Wi dividing the vector #« Ti by the number of trips departing from i.
vergence (Gauss's) theorem and have no curl.We propose three individual mobility models and analyze what is the minimal set of ingredients needed to find fields similar to the empirical ones.Our results show that a distribution of stops decaying with the distance to the city, length optimization for long displacements and randomlike local exploration are fundamental to reproduce empirical mobility fields.This work extends thus the mathematical amenability of mobility data and offers a new approach for individual mobility patterns analysis.

From trips to vectors
A sketch of the method to define a vector field is displayed in Fig. 1, where we show an idealized "circular" city, its central point (black circle) and three trajectories: 1-2-3-4-1, 5-6-4-5 and 5-7-5.Most of our trajectories are closed, although this is not necessary for the method to work.The steps to define the vector field are as follows: 1. We define vectors between consecutive stops.This can be seen, for example, in Fig. 1a and 1b, where the vectors # « 12, # « 23, # « 34 and # « 41 can be extracted from the first trajectory.A vector pointing from one stop X to the next Y is called # « XY and is located on X.

The vectors
# « XY are normalized to obtain the unit vectors # « xy.
3. The space is divided in grid cells of equal area and all unit trip vectors # « xy within each cell i are vectorially summed to define the resulting vector #« T i , which informs on the average mobility direction in i (see Fig. 1c).

#«
T i is normalized by the total number of trips leaving cell i to obtain the mesoscopic mobility vector field # « W i .This process is analogous to defining the gravitational or electrical fields dividing the force by the mass or charge, respectively.

The vectors
# « W constitute thus the mobility field.

From trip vectors to trajectory orientation
In order to characterize how individuals explore space we need to determine whether trips in each area head towards or away from a given reference point (RP).We identify two options for RP: the first one is the city geographical center (Fig. 1a), in this case the RP is absolute and equal for all the trajectories; the second option is to establish the origin of each trajectory as RP (Fig. 1b).This latter option implies that the RP is different for every trajectory.As we will see below, the trajectory-origin RP shows some useful features and most of the results here are, therefore, displayed using such RP unless otherwise stated.
With respect to the chosen RP, we can allocate a sign to each displacement vector # « XY (or # « xy) of any trajectory.Remember that the vector # « XY sits on X, and that every stop X can be described by a position vector #« X from the RP to X.To understand whether the displacement # « XY occurs toward or away from the RP, we compute the angle θ in the range (−180, 180] between the above two vectors (see Fig. 1a  We characterize for every trajectory t whether trips are on average toward (−) or away (+) from the RP by summing the signs of all the displacement vectors.Dividing by the total number of vectors, we obtain the average orientation H t laying in the range between −1 and 1, where H t = 1 means that all the displacements are positive (i.e., away from the RP) and vice-versa for H t = −1.Note that distance or duration of trips are not taken into account.Finally, H t = 0 implies a full balanced trajectory.H t is a microscopic observable that encodes individual behavior features of mobility.
A priori, there is no reason to assume that there should be more displacements in one orientation than in the other.This means that overall there should be as many trajectories with positive or negative H t .We count as N + the number of trajectories with H t > 0, N − the number of those with H t < 0 and N 0 as the number of balanced trajectories.We finally introduce the unbalance ratio ρ as The existence and orientation of the mesoscopic field and the orientation of trajectories are mathematically interlinked.Since every displacement generates an unit trip vector contributing to the overall mobility field, a situation with ρ clearly deviating from one could generate a majority direction for displacements and, consequently, an average field (see Appendix D for a demonstration that ρ < 1 implies the existence of a field).

Empirical trajectories orientation
Knowing that an unbalance in the trajectories may lead to the presence of a mobility field, it is important to test whether empirical trajectories are balanced or not.We consider two datasets: the first one, D1, refers to logistic trucks trajectories departing from the 21 largest Chinese cities, while the second one, D2, refers to Foursquare check-ins of individuals in New York City (NYC) (see Methods and Appendix A for a detailed description of the two datasets).We show the empirical results for ρ in Beijing, Shanghai and Chengdu of D1 using trajectory origins as the RP in Fig. 2. The pie charts show the fraction of positive, negative and balanced trajectories with a fixed number of stops and also the overall results.2. Unbalance of empirical trajectories.Fraction of positive (orange), Ht > 0, negative (blue), Ht < 0 and balanced (green), Ht = 0 trajectories with 3, 4, 5, 6 stops and for all trajectories of trucks departing from Beijing, Shanghai and Chengdu, using the trajectory origin as RP.ρ stands for the ratio between the number of positive and negative trajectories.
trajectories in all the 21 cities analyzed.This reflects that goods delivery trucks tend to reach the furthest location first and then gradually approach the origin (so that H t < 0).Note that negative trajectories dominate independently from the length of trajectories and there is high similarity in the ρ values across cities, overall and for trajectories with a certain number of stops.This result does not hold if the RP is the city center (see Appendix C and Figs.S5-S7).It seems that the analysis performed with the origins of trajectories as RP is able to absorb the details of the cities in terms of shape, streets and communication axes (e.g.highways), hence leaving only individual mobility behavior.This has two consequences: firstly, we can neglect the urban shape when modeling individuals' mobility behavior in terms of ρ; secondly, we can tune the model on an arbitrary city and perform outof-sample accurate predictions.The results are robust for further cities (see below in Understanding the origin of Average distance between stops: d-rand the trajectory unbalance section and Appendix E Figs.S10-S12).Since the definitions of vector signs and trajectory orientations are general, we can apply it to any type of mobility.As a comparative, we perform the same analysis on D2, the check-in records of Foursquare, and find a similar pattern (see Fig. S10).Note that D2 encodes a different type of mobility compared to D1.The values of ρ for D2 with individuals' check-ins are different from the ones obtained from D1 with freight data both for trajectories of a certain stops number only or for all trajectories.

Understanding the origin of the trajectory unbalance
In order to understand the mechanisms leading to the empirically observed unbalance between the signs of the trajectories, we introduce four models in an increasing order of complexity, each with added mechanisms over the previous one to characterize the needed ingredients (see Methods for details).The simplest configuration includes a circular city of radius R. A generic trajectory is composed of a sequence of stops { #« x 0 , #« x 1 , ... #« x s−1 }, with the origin #« x 0 located randomly inside the circular city.
For the first model, called Rand, the other stops locations are selected completely at random in space.We have confirmed that the model trajectories tend to be balance in the thermodynamic limit (large L) and that it does not generate a field (see Fig. S15).This was a sanity check before advancing to more elaborated models and we disregard this model from now on.The next three models are more relevant and are developed making simple behavioral assumptions about spatial navigation, a sketch with their description can be seen in Fig. 3.
The d-rand model follows the same logic but is informed with a spatial distribution of stops D(r) decaying with the distance to the city center.This mimics a random-like exploration but with constraints on the spatial distribution of stops.The next model, d-TSP, allows travelers to reorder the stops to minimize the total distance traveled (with a Traveling Salesman Problem (TSP) optimization algorithm).Finally, d-mix interpolates between the two previous models and the distance optimization only occurs if the average distance between trajectory stops goes over a threshold ℓ c .Trajectories distances are not optimized otherwise.More details on the precise definition of the models are offered in the Methods section below.
The models are informed by the empirical statistics from Beijing (see Methods for further details on the models construction).We tune the d-mix model parameter ℓ c by minimizing the mean absolute error on ρ in Beijing using all trajectories and the trajectory origin as RP.The best results are obtained for ℓ c ≈ 1.72 km, which is a reasonable threshold for route optimization (see Appendix J and Fig. S20).
Violin plots in Fig. 4a show the resulting distribution of ρ for the three stochastic models and the empirical distribution from the 21 cities in D1.We also show the distributions for trajectories with a fixed amount of stops only.The first question to highlight is that all the models produce trajectories that are consistently negative and hence holding ρ < 1.A second aspect is that d-rand generates the lowest values of ρ among all models.This is natural since the location of consecutive stops is random, without ordering them to reduce the distance traveled, and hence, the probability of crossing to the other side of the city (a negative sign for the trip) is high.Many negative trips contribute to an overall negative sign for the trajectories and a lower value of ρ.Models with stops reordering may reduce the number of displacements from one side to the other of the city, and the trajectory signs are less often negative (higher values of ρ).In contrast and following the same argument in the opposite way, d-TSP produces trajectories with the highest value of ρ. ρ for d-mix, on the other hand, lies between these two extremes as do also the empirical values of ρ.When the analysis is restricted to trajectories with a certain number of stops, the stochastic fluctuations are larger since the number of trajectories decreases.However, d-mix fits best to the empirical values of ρ.
In Fig. 4a we show the empirical ρ for trajectories from all the 21 cities in D1 together.The cities contributing to this violin plot are, nevertheless, very heterogeneous in population.This is why in Fig. 4b   only in Beijing).Finally, we can analyze the trajectories generated by the fitted d-mix model by their number of stops.In Fig. 4c, we see that, in general, the agreement for trajectories of a fixed number of stops aligns well with the empirical results from the three cities in Fig. 2. All these results have been confirmed using different values of the modeled city size R and the space considered L (see Methods for details on the models' parameters and Appendix G (Fig. S15) for the robustness check).

The d-mix model mobility field
Since the models produce unbalanced trajectories, i.e. rho < 1, they also generate a net mobility field (see Appendix D for a mathematical proof).Next we study the properties of the field # « W generated by the fitted d-mix model (see Methods for the details of the modeling setting).We consider circular contours of radius r from the city center and analyze the flux of # « W as a function of r.The flux is calculated in both as a surface integral over the circular contour and as the volume integral of the divergence of the mobility field, ∇ # « W (see Methods for the formal flux calculation).According to Gauss' Divergence Theorem, if the field is well behaved the two ways of calculating the flux should yield the same result.This is confirmed in Fig. 5a, where the two calculations of the flux Φ W as a function of the distance are in agreement with a coefficient of determination R 2 P = 0.98.Note that this does not apply to the city area where the estimated fluxes are close to zero.The fact that the Gauss' Theorem is fulfilled is important because it is related to the existence of a source for the field.
A second relevant feature to explore is the field curl.This is connected to the possibility of defining a potential for the field.Fig. 5b displays the module of the curl, which lies in the z-axis perpendicular to the plot.One can distinguish the area of the city in the internal circle.There are some non-zero curl areas, with the highest values concentrated close to the city border.Actually, these values are small when compared with a null model (see Fig. 5c).In this null model, the direction of the d-mix vectors in each cell is randomly reoriented.We call this the "fully random" model and it is intended to assess the level of curl induced only by noise.The overall distribution of curl modules of the field generated by the d-mix has lower variance than the null model one.Furthermore, in Fig. 5d, we compare the average module of the absolute value of the curl enclosed by a circle of radius r from the city center.We see that both models coincide inside the city r ≤ R = 20 km.However, beyond the city area the d-mix model has curl values systematically below the null model ones.This guarantees the possibility to define a potential out of the city for the d-mix mobility field.

Empirical mobility fields
In Fig. 6a-c, we see that the empirical fields generated in the three cities used as example fulfill the Gauss' Theorem as well.Results for further cities are included in Fig. S16.The flux profiles as a function of r show some common features: a first r interval with negative Φ W values (vector field pointing mainly outwards).This area is the core of the city, where the logistic trucks drop goods while most of the origins of the trajectories lie further in the peri-urban area.Despite its negative character, the existence of a net flux is relevant because the agreement between both integrals show that the Divergence Theorem is fulfilled in all the space.Further, the flux becomes positive (vector field pointing on average inwards) and it reaches the half height value at r c = 25 km for Beijing, r c = 22 km for Shanghai and r c = 14 km for Chengdu.Since for the d-mix model the city size approximately coincides with the point at which Φ W (R) ≈ Φ max /2, we will take r c as arbitrary radii of the three cities logistic cores.The fluxes peak further away (∼ 35 km in Beijing, ∼ 50 km in Shanghai and ∼ 20 km in Chengdu) and eventually Φ W decays as r increases.These general features compare well with those of the d-mix model (Fig. 6a), except for the inside city fluxes, which are not well captured by the model.
We display in Fig. 6d-f the average absolute value of the curl enclosed and excluded by a circle of radius r c , i.e. internal and external respectively, to the cities.We find that the empirical curl is systematically lower than the fully-random counterpart in all cases.This implies that empirical potentials can be defined in all the space.Similar results are attained with other cities of D1 (see Fig. S17) and with the Foursquare check-in data in New York City (see Fig. S18).

Potentials
Knowing that we can define a potential for the d-mix model (out of the city) and also for the empirical data everywhere, we plot next the equipotential curves on the maps (Fig. 7).The potential of the d-mix model shows a circular symmetry.This is due to the circular city shape introduced (Fig. 7a) and the isotropic assumption.The contours of the empirical urban areas are dependent on the city shape (Fig. 7b-d), adapting to geographical constraints as in the case of Shanghai with the sea and islands.It is also interesting how the potential highlights the presence of satellite cities as occurs for Beijing and Shanghai.The potential contours plotted extend some tens of kilometers outside the cities, eventually becoming fuzzy.The field continues beyond that point, but given the lack of statistics it is hard to extract meaningful potential contours.

Hybrid d-mix model
We saw how the unbalance ratio of trajectories orientation ρ does not depend on the urban shape when using trajectory origins as RP.This is critical since it allowed us to study the origin of the mobility fields with simplified models.However, the spatial shape of the fields and potentials are inherently connected to the real city configuration.To explore further the validity of the model assumptions, we need to make another step and introduce a hybrid d-mix model.We consider the empirical trajectories one by one, e.g., { #« x 0 , #« x 1 , ..., #« x s−1 }, keeping #« x 0 as origin, but randomizing the order of the other stops.
We then input these trajectories to the d-mix model, reordering them according to the model rules.The resulting trajectories are not necessarily equal to the empirical original ones, although if the model is doing a good work they should be similar.Indeed, we have a coincidence of 79% in Beijing, 71% in Shanghai and 79% for Chengdu.The question is thus whether this over 20% mismatch has mesoscopic effects on the field or not.
The potential estimated from the hybrid d-mix model and the empirical one are compared for the three cities in Fig. 8a-c.We find a good agreement with R 2 P above 0.93 for all the three cities.3D profiles of the potential are displayed in Figs.8d-f for the empirical fields and in Figs.8g-i for the field generated by the hybrid d-mix model.One can clearly appreciate the similarity between modeled and empirical potentials of the same city.This implies that the hybrid d-mix model is capturing well the mechanisms behind the empirical mobility fields and its utility may go beyond its use as explanatory tool for individual behavior as above.The major deviations occur in the largest potential values, close to the maxima and the city centers where the d-mix potential is undefined and for which the model has not been fitted.

DISCUSSION
In this work we have introduced a new way to define a mobility field starting from individual trajectories.This is a major generalization with respect to previous works based on Origin-Destination commuting matrices, and it allows us to study a wide range of mobility data.Besides the conceptual leap, with this new framework we have studied mobility data from two different sources: logistic routes of trucks around and across the 21 largest Chinese cities and Foursquare check-ins in NYC.In all cases, we have found a well-behaved field fulfilling the Gauss Divergence Theorem and with a curl value that it is in general smaller than the one expected by a fully random model.This implies that it is possible to define a potential almost anywhere in metropolitan areas and, consequently, to search for a source for the mobility field.
Starting from individual behavioral assumptions of spatial exploration, we have advanced in the conceptual framework by analyzing the basic ingredients needed to generate mesoscopic mobility fields with features matching those of the empirical ones.We have introduced a metric, the unbalance ratio ρ, to characterize the fraction of displacements in trajectories that move mostly towards or away from a reference point RP (city center or origin of the trajectory).The unbalance among these directions (ρ far from one) implies a net displacement direction and induces a mobility field.This metric allows us to quantify the strength of the factors leading to the formation of the field.We have then introduced a set of minimal models with growing complexity to explore what information is fundamental to generate the field.All the models are based on trajectories, and so the ba- This work has an eminent conceptual side, advancing on the understanding of how the field theory can be applied to the mesoscopic scales of human mobility.Field theory is a fundamental tool in physics with a well equipped set of mathematical results developed for its use, which we hope can be translated to mobility studies in the near future.Additionally, urban poly-centrism and predominant patterns among mobility centers have been recently the focus of many studies due to their association to life quality indexes, city livability [12], walkabil-ity, sustainability, services accessibility and epidemic outbreak susceptibility [32].The potential provides a clear representation of the structure of a city at a mesoscopic scale.It captures the spatial organization and connectivity patterns of mobility centers, offering insights into the distribution of activities, resources and flows.This information can help urban planners to take more informed decisions.

Mobility data and processing
The empirical results of this work are based on two datasets: the first is a truck travel records (D1) [63,64], and the second is the check-in records of Foursquare (D2) [65].
The D1 dataset includes data from over 20 Chinese cities, which can be downloaded from the National Road Freight Supervision and Service Platform (https://www.gghypt.net/).This platform is used to record the real-time geographic locations of all heavy trucks in China and monitor the potential traffic threats.The dataset contains more than 2.7 million travel records, spanning from May 18, 2018 to May 31, 2018.The attributes of one travel record include truck ID, timestamp, longitude, latitude and speed (see Appendix A for more details about the dataset used in this study).
The D2 dataset is from New York city [65].Foursquare is a location-based social network on which users share their coordinates when check-in in (see appendix A for more details about the dataset).This dataset contains 42035 individuals, in which 23520 users have trips among different analysis zones (here the space is divided according 2010 census areas, see https://www.census.gov/geo/mapsdata/maps/block/2010/).

Definition of the RP in D1 and D2
In both D1 and D2, we can define the RP as the city center or the trajectories origins.For the latter case, in D1 this is identified as the truck most commonly visited location with the longest stay times, since this is likely to be the logistic center of operations.In D2, we assign individuals' origins as the location with the largest number of check-ins.In both D1 and D2, a single trajectory is the defined as the sequence of stops occurring between the first and the next stop in the origin.The statistical description of the D1 trajectories in several cities of D1 are provided in Fig. S3, and the same for D2 in NYC in the Fig. S4.

Models
The basic ingredients of our models will be inspired by the structure and statistics of the empirical trajectories.Fig. S3 shows the complementary cumulative distributions of three variables associated to trajectories starting in a circle of radius 20 km centered at Beijing, Shanghai and Chengdu as paradigmatic examples.The first distribution, D(r), refers to distance of the trajectory stops to the city centers (Fig. S3a-c).The city centers are the barycenters of the areas considered (see Table S2).
As the figure shows, the location of the stops can be relatively far away from the city center, with the distribution falling slowly to the thousands of kilometers.We will adopt in the models a probability D(r) of finding a stop at a certain distance r of the city center that on very first approximation will fall as D(r) ∼ r −α .The next distribution, Fig. S3d-f, refers to the number of stops per trajectory N s (s).The minimum number of stops is 2, because we are counting at least origin and final destination.The range of values is relatively limited, up to 40, and presents a decay that we will approach in the models by N s (s) ∼ s −β .Finally, Fig. S3g-i shows the distribution of the number of trajectories starting from the same origin N t (t).Truck fleets may have an operation center from which several vehicles leave, or simply the same truck appears in several trajectories starting always from the same origin.The distribution is wide, reaching more than one thousand trajectories and we will approximate it by N t (t) ∼ t −γ .The distributions for the Foursquare check-in in New York City can be found in Fig. S4.
If we consider as in Fig. 1a circular city of radius R inside of a space limited by a square of side L, we can build a first null model by selecting the location of the origin of a trajectory at random in the internal circle #« x o .We call this model the Rand model.The number of trajectories departing from #« x o is then obtained as a stochastic extraction of N t (t), while for each of them the number of stops s can be extracted from N s (s).The location of the s − 1 consequent stops is randomly chosen within the bounding square.We take a setting with a circular city of radius R = 20 km centered in a squared area of side L = 400 km.This is to be considered the general setting on which we run all our models.The model is run to generate 100 000 trajectories and with them produce a field # « W following the recipe of Fig. 1.In each perimeter position, we observe that the flux of the field as a function of r fluctuates around zero inside the city circle R and it only gets negative as r reaches close to the bounding box L (see Fig. S13).Such negative net flux is only a finite-size effect as can be seen by increasing the box size L and also by using periodic boundary conditions in the bounding box instead of open ones (Fig. S14).

The d-rand
There are, therefore, missing ingredients in this basic model to be able to generate a stable mobility field.The first mechanism that we are going to consider is a spatial distribution of stops falling with the distance to the city center D(r) as the one observed in Fig. S3a  (D(r . This model will be called d-rand (Fig. 3a) and it consists in randomly extracting, as before, a location in the circle containing the city for #« x 0 , the number of trajectories starting at #« x 0 from N t (t) and the number of stops per trajectory s from N s (s).Then, for each trajectory we choose at random with D(r) the radius of the location of the s − 1 stops besides the origin, the directions from the center in which every stop lies is also randomly selected.A trajectory is thus formed by the origin and all the other stops { #« x 0 , #« x 1 , ..., #« x s−1 }.As we will see, this model is able to produce unbalanced trajectories and a field.
d-rand has, however, a major caveat: consecutive stops can be at opposite sides of the city and it is unrealistic to have a driver passing back and forth through the city center without grouping nearby stops to reduce the total distance traveled and the fuel consumed.

The d-TSP
The next model to consider, called d-TSP (Fig. 3b), corresponds to the effect of a manager looking at the sequence { #« x 0 , #« x 1 , ... #« x s−1 } obtained as in the d-rand and reordering the sequence of stops from 1 to s − 1 to minimize the total trajectory distance.This process can be mapped into the well known traveling salesman problem (TSP), in which a salesman needs to visit a set of locations, each location is visited once and only once, and finally must return to the starting position [66].We employ in the d-TSP an heuristic algorithm (genetic algorithm [67]) developed to approximate the solution of the traveling salesman problem.

The d-mix
Finally, we introduce a model that interpolates between d-rand and d-TSP.We will call this model d-mix and the rules are as illustrated in Fig. 3c.The TSP reordering of stops is only allowed if the average travel distance of one trajectory between stops is larger than a threshold ℓ c .The idea behind d-mix is that the driver will not invest the effort of optimizing the trajectory if the distance between consecutive stops is very short.The limit of d-mix model for small ℓ c corresponds thus to d-TSP model, while for large ℓ c it becomes d-rand model.

Numerical calculation of the flux
The definition of the flux is where the integral is over the surface (perimeter S) enclosing the area of interest, dℓ is the element of surface, #« n the unit vector normal to the surface and # « W the vector field.From a numerical perspective, the integrals are calculated as where the index i runs over all the cells intersecting the perimeter S, #« n i is the unit vector normal to the surface in i and dℓ is approximated by the total perimeter of S divided by the number of intersecting cells.
The definition of the integral of the divergence is where the integral is now of volume (surface in 2D of the enclosed area).From a numerical perspective, the integrals are calculated as where the index i runs over the cells in the volume V and dV is approximated by the area of the unit cell.W x and W y are the x and y components of the vector # « W. The indices (α, β) refer to the position of cell i in the grid, in such a way that, for instance, (α ± 1, β) are the positions of the adjacent cells to i in the x-direction.∆x and ∆y are side sizes of the cells in the x− and y− directions.

Numerical calculation of the curl
The curl of # « W in the cell i, whose indices in the x− and y− directions are (α, β), as above, is determined as:

Numerical calculation of the potential
The potential is calculated by numerically solving the equations −∇V = # « W, taking into account that ∇× # « W = 0.For the computation of the empirical potential, we used conditions V = 0 in all the boundary regions of the box and then use the forward centered discretization formula for the gradient operator [58,68] starting from the city bounding box corner.In a cell i with indices (α, β) and also the procedure is iterated until all cells have been assigned a potential.To decrease the noise, we average the resulting potentials starting from the four corners of the bounding box .

Parameter estimation
In our d-mix model, the single free parameter is ℓ c , which directly determines whether the agents optimize or not the order of the trajectory stops.For a given empirical dataset, we rely on ρ to estimate ℓ c .To accomplish this, we define the following function where ρ data is obtained from the dataset for all trajectories, and ρ(ℓ c ) is calculated through the d-mix model with parameter ℓ c .The objective function can be minimized to yield an estimated value of ℓ c needed to reproduce with d-mix the signs of the set of empirical trajectories.The logistic data (D1) we used comes from truck travel records of 21 Chinese cities, which can be downloaded from the National Road Freight Supervision and Service Platform (https://www.gghypt.net/).The D1 dataset contains more than 2.7 million truck travel records, spanning from May 18, 2018 to May 31, 2018.Each truck travel records consists of truck ID, longitude, latitude, speed, timestamp, see Table S1.In previous works data treatment was performed in order to identify trip ends and obtain track sequence according to the identified trip ends [63,64].For a truck's base point, it could be a logistics center, or a freight hub.A truck frequently returns to the base point, due to the driver ending their work shift or reloading to start the next round of freight distribution.In view of this, we consider the most visited location by a truck as its base point (origin) and extract the truck delivery route from the track sequence.
As shown in Fig. S1, the 21 cities in D1 the most densely populated cities (e.g., Beijing and Shanghai) and less densely populated cities (e.g., Xining).These cities have different sizes, economies, cultural backgrounds and infrastructural setting.This gives us an important opportunity to study mobility patterns in heterogeneous social contexts.In Table S2, we list the trajectories, city center, population and GDP of the 21 cities we consider.

Foursquare check-ins data
The Foursquare check-ins data (D2) is from New York City (NYC).Foursquare is a location-based social network on which users share their coordinates when checking in.The D2 dataset contains 42035 individuals, in which 23520 users register trips among different anal-    ysis zones (here the space is divided according 2010 census areas, see https://www.census.gov/geo/mapsdata/maps/block/2010/).We consider the most common check-in zone as the individual's trajectory origin.
Similarly to what described when dealing with logistic data, we extract a sequence of consecutively visited locations with the last location corresponding to the initial one [65].Check-in times and zones distribution in NYC, see Fig.   (c,f,i) number of truck trajectories with the same origin.The lines in each panel represent a power-law fit to the real data.The dashed lines are only illustrative, to an impression of a simple function approaching the data to be used in the models, and not a product of fits.The trajectories considered for the statistics are those whose origin lays in a circle centered in three cities and with a radius of 20 km.of: a the distance between the trajectory stops and the city center; b number of stops per trajectory; c number of truck trajectories with the same origin.The lines in each panel represent a power-law fit to the real data.The dashed lines are only illustrative, to give an impression of a simple function approaching the data to be used in the models, and not a product of fits.The trajectories considered for the statistics are those whose origin lays in a circle centered in three cities and with a radius of 5 km.FIG.S5.Unbalance of empirical trajectories of using geographical center of the city as RP.Fraction of positive H(t) > 0, negative H(t) < 0 and balanced H(t) = 0 trajectories with 3, 4, 5, 6 stops and for all trajectories of trucks serving Beijing, Shanghai, Chengdu, Tianjin, Nanjing, Hangzhou and Hefei.

City
Appendix C: The geographical city center as RP In the main text, we use the origin of each trajectory as the RP.Here we show the results when we consider the geographical center of the city as RP.As shown in Figs.S5-S7, the fraction of positive H(t) > 0, negative H(t) < 0 and balanced H(t) = 0 trajectories with 3, 4, 5, 6 stops and for all trajectories of trucks serving the Chinese cities.
Compared to use the origin of each trajectory as the RP, we can find that trajectories orientation across cities is significantly different and the values of ρ for all trajectories is related to the urban population.This is probably because when we use the city center as RP, some details of the city are taken into account, such as the city shape, distribution of stops, streets and communication axes (e.g.highways).

Appendix D: Relationship between ρ and the field
The unbalance factor ρ is defined based on the trajectories.Recalling the form of ρ: where N + and N − are the number of positive and negative trajectories, respectively.Besides signed trajectories, there are also balanced ones up to a number N 0 .The total number of trajectories is thus The objective here is to prove that ρ < 1 implies the existence of a mobility field.Note that this is a necessary condition, but not a sufficient one, since as we will see a field can still exist even if ρ = 1.
Every trajectory is composed by a sequence of s stops { #« x 0 , #« x 1 , ..., #« x s−1 }. Between each couple of consecutive stops i and j, we have defined a unit vector #« ij that has a sign depending on whether they point toward the RP or away.Each trajectory has s vectors and its final sign is the majority sign of them.These unit vectors, summed up vectorially in each cell of the space grid, generate the field.In order for a field to exist it is thus necessary to have an unbalance either global or local in the number of positive and negative vectors.In fact, we can define an auxiliary ratio where S + and S − are the global number of positive and negative, respectively, unit vectors.Note that ρ ′ < 1 implies the unbalance of the unit vectors and, therefore, a global mobility field.Given that each trajectory has generically s stops and, consequently, s unit vectors, we will call s + and s − , respectively, to the number of positive and negative vectors.We can then write that s = s + + s − in each trajectory.When considering all the trajectories, we can define ⟨s + ⟩ + and ⟨s − ⟩ + as the average number of positive and negative vectors in positive trajectories.The corresponding averages for negative trajectories are ⟨s + ⟩ − and ⟨s − ⟩ − , and for balanced trajectories ⟨s + ⟩ 0 and ⟨s − ⟩ 0 .With these averages, we can write ρ ′ as Dividing above and below in the ratio by N − , we find Some conditions are clear by definition: The first is that ⟨s + ⟩ + > ⟨s − ⟩ + and ⟨s − ⟩ − > ⟨s + ⟩ − .The second one is that ⟨s − ⟩ 0 = ⟨s + ⟩ 0 = s 0 /2, where s 0 is the average length of balanced trajectories.Then ρ ′ can be written as Modifying this equation to have ρ as a function of ρ ′ , we can impose the condition that ρ < 1 to obtain: (D6) This implies that for ρ < 1, ρ ′ must be A table with the statistics of the data trajectories and for those of the d-mix model are provided in Table S4.
Inputing this information into Eq.(D7), we obtain the values of ρ ′ sup shown in Table S5.All of them are smaller or equal than one and, therefore, ρ < 1 implies ρ ′ < 1.
An alternative way of demonstrating this condition and gaining some insights in the process on the relation between ρ and ρ ′ is to wonder when ρ is smaller or larger than ρ ′ .We retake Eq. (D5) and impose that ρ ′ > ρ, this yields: The left side of the expression is a linear function L(ρ) in the domain of ρ values, while the right side R(ρ) is a parabola so at certain point for a value of ρ, ρ c , both curves cross (L(ρ c ) = R(ρ c ), see Fig. S8).For ρ < ρ c , we have ρ ′ > ρ, while we have ρ ′ < ρ for ρ > ρ c .The crossing point occurs when Therefore, the expression for ρ c is The question is thus whether ρ c is larger or smaller than one.If ρ c < 1, then when ρ approaches 1 from below but it is larger than ρ c then we have that ρ ′ < ρ < 1 and, therefore ρ < 1 implies as well that ρ ′ < 1. Table S5 shows the values of ρ c for all the datasets and the models.In all the cases, ρ c < 1. Therefore in our case, the condition ρ < 1 =⇒ ρ ′ < 1 holds.
Note that this is not a general demonstration, it is only valid for our cases (empirical datasets and models).
There can be values of the statistics of the trajectories for which this relation between ρ and the field does not hold.This second demonstration is interesting because it could help to determine in more general cases the conditions for the existence of the field given the value of ρ and the critical value ρ c .In the main text, we have shown the unbalance of empirical trajectories in Beijing, Shanghai and Chengdu.To show our results are robust, here we show the fraction of positive H(t) > 0, negative H(t) < 0 and balanced H(t) = 0 trajectories with 3, 4, 5, 6 stops and for all trajectories of individuals in NYC (see Fig. S9) and trajectories of trucks in more Chinese cities (the other 18 Chinese cities), using the trajectory origin as RP, see Figs.S10-S12.
Combining the results of the main text, we can find that the fraction of N + and N − for trajectories of a fixed number of stops and any number of stops is very close across cities.The number of trajectories with negative orientation N − is dominant, hence ρ < 1 in all cities.
Note that Foursquare check-ins data encode a different type of mobility compared to logistic data.The values of ρ of individuals' check-ins are different from the one obtained from logistic data both for trajectories of a certain stops number only or for all trajectories.
-+ 0 tions, we observe that the flux decreases (it is more negative) as we get closer to the bounding box.However, with periodic boundary conditions, we find that the flux is fluctuating around zero and, therefore, the vectors are random in direction and module.
FIG. S12.Unbalance of empirical trajectories in Chinese cities. Fraction of positive H(t) > 0, negative H(t) < 0 and balanced H(t) = 0 trajectories with 3, 4, 5, 6 stops and for all trajectories of trucks serving Nanning, Guiyang, Shenyang, Xi'an, Harbin and Xining, using the trajectory origin as RP.Gauss' Theorem as well.Results for the other cities (18 cities) are included in Fig. S16.The centers of the cities for the flux calculations used to check Gauss' Theorem have been taken at locations shown in Table S2.We can find that the empirical fields generated 18 cities all fulfil the Gauss' Theorem as well.Note that the value of coefficient R 2 p in Haikou, Lhasa and Xining we obtained is a little small, this because the number of trucks in three  cities is relatively rare, see Table S2.
Appendix I: Fit of the average distance ℓc for the d-mix model The d-mix model has a parameter ℓ c that directly determines whether the agent takes the optimal or a ran-dom route.For a given empirical dataset, we rely on the track orientation metric ρ and use the Eq. ( 11) in the main text to estimate ℓ c .In Fig. S20, we show that the mean absolute error E(ℓ c ) registers a minimum for an average distance ℓ c = 1.72 km.For lower values of ℓ c , individuals optimize their trajectory even for very short trips.Vice versa, for higher values of ℓ c , individuals take random routes even FIG.1.Definition of trajectories orientation and resulting vector field.The large gray circle stands for an idealized city (with the black point as the city center), the green numbered circles are the stops sequence of each trajectory (1-2-3-4-1, 5-6-4-5 and 5-7-5), while the square in 1 and the hexagon in 5 are the trajectories' origins.a City center as RP.Vectors of each color connect consecutive stops of a trajectory, for example, the trajectory 1-2-3-4-1.When the vector connecting X to Y forms an acute angle with the position vector #« X starting in the city center, we mark the vector # « XY as positive, meaning that the agent is moving away from the city center, like the vector # « 57.Vice versa, we mark the vector as negative when the agent moves towards the city center, e.g. the vector # « 75.b Trajectory-origin as RP.Identically to what we defined for the city center but we use the origins of trajectories 1 or 5, respectively, as RP for the position vectors.c Sketch of the method to build the vector field.The space is divided in a grid, vectors departing from stops in a cell i are normalized and summed vectorially to produce #« Ti.Then we define the mobility field # « Wi dividing the vector #« Ti by the number of trips departing from i.

FIG. 3 .
FIG.3.Schematic description of the models.The large gray circle represents the city area (the black circle is the city center), the numbered green circle represent stops while square and hexagon are the origins of two trajectories (red and green).These trajectories have different average trips distance: the green one is below ℓc, the red trajectory is above ℓc.a d-rand, b d-TSP, c d-mix.

FIG. 4 .
FIG. 4. Models and trajectory unbalance.a Violin plots of the model-predicted and real values of ρ for trajectories with different number stops.The violin plots depict the values of ρ from d-rand model (in green), d-mix model (in yellow), the data in D1 (in purple) and d-TSP model (in red) for trajectories with 3, 4, 5, 6 stops and for all trajectories.b Values of ρ for the different cities as a function of their population.The horizontal dashed lines correspond to the median values of ρ for all the modeled trajectories, shaded areas indicate the confidence interval between 5% and 95%, color codes as above.Purple dots represent the values of ρ for all empirical trajectories in each city.c Pie charts for the fraction of positive, negative and balanced trajectories generated with the fitted d-mix model.These sequence of charts should be compared with the empirical values observed in different cities in Fig. 2. For these simulations, we used R = 20 km and L = 400 km and with 100 000 trajectories.

FIG. 5 .
FIG. 5. Properties of the d-mix mobility field.The mobility field is obtained from the d-mix model with ℓc = 1.72 km with a circular city of radius R = 20 km in a space limited by a box of side L = 400 km and with 100 000 trajectories.a Comparison between the flux measured as the surface integral in blue, and as the volume integral of the divergence of the mobility field (see Methods for the calculation) in red, in both cases as a function of the distance to the city center r.The coefficient of determination R 2 p is obtained as the square of the Pearson correlation coefficient of both curves.b Module of the curl of the field, the colors represent |∇ # « W| for each cell in km −1 .c Comparison between the curl of the field generated with the d-mix model and that of the fully random model obtained by randomly reassigning directions to # « W in each cell.d Average absolute value of the curl as a function of the distance to the city center for the d-mix and the fully random model.

FIG. 6 .
FIG.6.Empirical vector fields.In a-c, fluxes of the empirical field #« w as a function of the distance to the city center r.In blue the surface integral flux, in red the volume integral flux, both as a function of r.R 2 p is the coefficient of determination between the curves of the flux and the volume integral of the field divergence as a function of the distance .d-f Average module of the curl, comparing the fully-random model and the empirical data.The separation between in and out of corresponds to the distance at which ΦW = Φmax/2, rc, as described in the text.The analysis in the outer side ends at r = 50 km.After this distance, trip vectors in Beijing and Chengdu become sparse and the statistics is nonrepresentative.

FIG. 7 .
FIG. 7. Equipotential contours.a Contours for the dmix model with 100 000 trajectories in a circular city of radius R = 20 km and in a box of side L = 400 km.Empirical equipotential contours for b Beijing, c Shanghai and d Chengdu.

FIG. 8 .
FIG. 8. Hybrid d-mix model predictions of empirical potentials.a-c Correlation plots between the hybrid d-mix model and the empirical potentials, yielding Rp 2 = 0.97 for Beijing, Rp 2 = 0.93 for Shanghai and Rp 2 = 0.96 for Chengdu.d-f Empirical potential in the space and g-i the hybrid d-mix model predictions.Both models and empirical flows show the polycentric nature of Beijing and Shanghai, while Chengdu is more monocentric.
FIG. S1.Considered cities in logistic data.The locations of the 21 cities and the number of trucks in each city.
FIG. S2.Foursquare users check-ins in New York City.Check-in times and zones distribution NYC. S2 FIG.S3.The statistical description of Beijing, Shanghai, Chengdu.The complementary cumulative probability distributions (CCDF) of : (a,d,g) the distance between the trajectory stops and the city center; (b,e,h) number of stops per trajectory; (c,f,i) number of truck trajectories with the same origin.The lines in each panel represent a power-law fit to the real data.The dashed lines are only illustrative, to an impression of a simple function approaching the data to be used in the models, and not a product of fits.The trajectories considered for the statistics are those whose origin lays in a circle centered in three cities and with a radius of 20 km.
FIG. S4.The statistical description of New York.The complementary cumulative probability distributions (CCDF) of: a the distance between the trajectory stops and the city center; b number of stops per trajectory; c number of truck trajectories with the same origin.The lines in each panel represent a power-law fit to the real data.The dashed lines are only illustrative, to give an impression of a simple function approaching the data to be used in the models, and not a product of fits.The trajectories considered for the statistics are those whose origin lays in a circle centered in three cities and with a radius of 5 km.
FIG. S13.Properties of the flux of Rand model with open boundary conditions.Here we set the values of R = 20 km in a space limited by a box of side L = 100 km in a and L = 400 km in b with 100 000 trajectories.The flux is displacing to the right as the value of the bounding box frame L increases.
FIG. S14.Properties of the flux of Rand model with periodic boundary conditions.Here we set the values of R = 20 km in a space limited by a box of side L = 400 km periodic boundary conditions with 100 000 trajectories.
FIG. S16.Fluxes of the empirical field in Chinese cities.We consider trajectories in the remaining 18 Chinese cities not considered in the main text.Fluxes of the empirical field #« w as a function of the distance to the city center r.The blue symbols correspond to the flux calculated as a surface integral and the red ones to the volume integral of the field divergence.The coefficient R 2 p is obtained from a Pearson correlation of one flux with the other along the distance r.
FIG. S17.Average module of the curl in Chinese cities.We consider trajectories in the remaining 18 Chinese cities not considered in the main text.Average module of the curl, comparing fully-random model and empirical data.The separation between in and out is defined by the radius rc at which Φ = Φmax/2.The inner part is enclosed by the radius rc, the remaining area is the outer part, which ends at r = 50 km.After this distance, trip vectors become very sparse and statistics become non-representative.
FIG. S18.Empirical vector field in NYC.In a, fluxes of the empirical field #« w as a function of the distance to the city center r.The blue symbols correspond to the flux calculated as a surface integral and the red ones to the volume integral of the field divergence.The coefficient R 2 p is obtained from a Pearson correlation analysis of one flux with the other along distance r. b Average module of the curl as a function of the distance to the city center r.Inbox, comparing the empirical and randomized model average curl modules inside the city and outside.The separation between inside-outside has been arbitrarily establish at R = 4 km in NYC.c Empirical equipotential contours for NYC.

FIG. S20 .
FIG. S19.Hybrid model predictions of empirical potentials a Correlation plots between the hybrid d-mix model and the empirical potentials, yielding Rp 2 = 0.9.b Empirical potential in the space and c the hybrid d-mix model predictions.
and 1b), where |θ| = 0 means moving straight away from the RP and |θ| = 180 means moving strictly toward the RP.We assign the vec- Trajectories with negative orientation (i.e., N − ) are dominant, hence ρ < 1 in all cases.Results are robust for trajectories of any number of stops and for all FIG.

TABLE S1 .
List of examples of heavy truck GPS data.

TABLE S2
. Number of trajectories, city center, population and GDP of the 21 cities.

TABLE S3 .
List of power-law exponents of Beijing, Shanghai, Chengdu and NYC for the three distributions considered.

TABLE S5 .
List of the values of ρc and ρ ′ sup for all the datasets, d-mix, d-TSP and d-rand model.
In the main text, we have defined the Rand model.Here we run the Rand model to generate trajectories in two different scenarios.The first scenario is a circular city of radius R within a square of side L, the second one is a circular city of radius R within a space with periodic boundary conditions.We check the features of Rand model flux as a function of the distance to the city center, see Figs.S13-S14.With open boundary condi-