Article | Open Access | Published:

Diversity of individual mobility patterns and emergence of aggregated scaling laws

Scientific Reports volume 3, Article number: 2678 (2013) | Download Citation

Abstract

Uncovering human mobility patterns is of fundamental importance to the understanding of epidemic spreading, urban transportation and other socioeconomic dynamics embodying spatiality and human travel. According to the direct travel diaries of volunteers, we show the absence of scaling properties in the displacement distribution at the individual level,while the aggregated displacement distribution follows a power law with an exponential cutoff. Given the constraint on total travelling cost, this aggregated scaling law can be analytically predicted by the mixture nature of human travel under the principle of maximum entropy. A direct corollary of such theory is that the displacement distribution of a single mode of transportation should follow an exponential law, which also gets supportive evidences in known data. We thus conclude that the travelling cost shapes the displacement distribution at the aggregated level.

Introduction

Positioning systems in mobile phones and vehicles and Wi-Fi devices in laptop computers and personal digital assistants have made quantitative analyses of human mobility patterns possible1,2,3,4. These analyses have a significant potential to reveal novel statistical regularities of human behavior, refine our understanding of the socioeconomic dynamics embodying spatiality and human mobility5,6, and eventually contribute to controlling disease7,8,9,10, designing transportation systems11, locating facilities12, providing location-based services13,14,15, and so on.

Aggregated data from bank notes16, mobile phones1 and onboard GPS measurements3 showed that the displacement distribution of human mobility, for both long-range travel and daily movements, approximately follows a power law. The scaling laws in long-range travel may result from the hierarchical organization of transportation systems17, while the scaling laws in daily movements have recently been explained by the exploration and preferential return mechanism18.

Thus far, we still lack solid results about human mobility patterns at the individual level. Inferring individual features from the aggregated data is very risky because the scaling law for the population could be a mixture of many individuals with different statistics19. In addition, the aforementioned data are not sufficient to draw conclusions at the individual level. First, data such as GPS records from taxis and the trajectories of bank notes consist of many individual movements, but these individuals are not easy to be distinguished from each other. Second, data such as GPS records from mobile phones and the trajectories of bank notes could not accurately capture purposeful travels with explicit origins and destinations. In fact, the displacement between two activations of a mobile phone may be just a tiny portion of a purposeful trip or a combination of several sequential trips, while the displacement between two registrations of a bank note could be the result of a number of sequential trips made by different people.

Instead of using proxy data, we analyze the travel diaries of hundreds of volunteers. Though the data set is small, it contains personal profiles and explicit positions of origins and destinations, allowing quantitative and authentic analyses at the individual level. In contrast to the scaling laws in aggregated data, individuals show diverse mobility patterns, and few of them display the scaling property. In fact, the trajectories of students and employees are dominated by trips connecting homes with schools and workplaces, respectively, while trips are distributed more homogeneously among different locations for others such as retirees, homemakers and unemployed people. The aggregated displacement distribution follows a power law with an exponential cutoff, which can be analytically explained by the mixed nature of human travel under the principle of maximum entropy. In addition, this theory predicts that the displacements using a single mode of transportation will follow an exponential distribution, which is also supported by the empirical data on taxi trips, car trips, bus trips and air flights.

Results

Individual mobility patterns

Our analysis of human mobility is based on a data set of 230 volunteers' six-week travel diaries in Frauenfeld, Switzerland20. This data set contains the volunteers' personal information, including age, job and sex, and 36761 trip records. By calculating the spherical distance between the origin and destination from their longitudes and latitudes, we can obtain the length of each trip (see details about data in Methods).

We first measure the individual displacement distributions from the data set. Figs. 1(a)–1(c) show three typical individuals' displacement distributions (Table S1 presents all volunteers' displacement distributions), from which we cannot find any universal scaling properties. Indeed, when we use the Kolmogorov-Smirnov test21 to test whether the distributions fit power laws, we find that 87.8% of the individuals cannot pass the test (statistical validation results are listed in Table S2, and the details about Kolmogorov-Smirnov test are shown in Methods). This result strongly suggests the absence of scaling laws in human travel at the individual level.

Figure 1: Individual mobility patterns.
Figure 1

(a–c) Displacement distributions for three typical individuals ((a) a student, (b) an employee, (c) a retiree), where the peak values for the student and the employee result from the trips between two most frequently visited locations. (d–f) Mobility networks for the three individuals, where the area of a node is proportional to its number of visits and the width of an edge is proportional to its weight.

To reveal the underlying structure of individual trips, we assign to each individual a mobility network, in which nodes denote locations visited by individuals, edges represent the trips between nodes and edge weight is defined as the number of corresponding trips22. Figs. 1(d)–1(f) show three typical individuals' mobility networks (all networks are presented in Table S1). As shown in Fig. 1 and Table S1, for most students and employees, their edge weights are highly heterogeneous. For each individual, we call the trip corresponding to the edge with the largest weight the dominant trip and define the domination ratio d as the ratio of the weight of the dominant trip to the total weight. Fig. 2 reports the distribution of domination ratios for different groups of individuals, from which we can see that the student group has the largest d on average and the employees' average domination ratio is smaller than that of the students but larger than that of the other group.

Figure 2: Distribution of the domination ratios.
Figure 2

(a) Population. (b) Student group. (c) Employee group. (d) Others. S is the number of group members, and is the average domination ratio.

The difference of d results from the fact that students and employees frequently travel between homes and schools/workplaces in working days but retirees or homemakers do not have to do so. The peak values in the displacement distributions of students and employees are thus usually determined by the lengths of their dominant trips. Because the lengths of dominant trips are not necessarily small, the displacement distribution for an individual is usually not right-skewed and is far different from a power law. In addition, the significant role of the dominant trip indicates that an individual's traveling process in general cannot be characterized by the Lévy flight16 or truncated Lévy flight1.

Scaling property in aggregated data

The aggregated displacement distribution of individuals (see Fig. 3) is well approximated by a power law with an exponential cutoff P(r) r−1.05 exp(−r/50) (the fitness significance p-value by the Kolmogorov-Smirnov test21 is 1.000 and the standard Kolmogorov-Smirnov distance D is 0.039, see Methods and Fig. S1 for details), which is similar to those observed for bank notes16 and mobile phone users1. As shown above, this scaling property is not a simple combination of many analogous individuals. We assume that the total travel cost is C, the number of trips with cost ci is ni. According to the maximum entropy principle23, the two constraints, Σni = N and Σnici = C, lead to the solution , where is the average travel cost. Denote the density of trips with cost c by P(c), then .

Figure 3: Displacement distribution P(r) of the aggregated data.
Figure 3

The solid line indicates a power law with an exponential cutoff. The data were binned using the logarithmic binning method (see Methods for details).

The travel cost is commonly approximated as the weighted sum cηt + μm, where η and μ are two coefficients, and t and m are the costs involving time and money, respectively. Previous empirical studies have suggested that the monetary cost is approximately proportional to the travel distance as mνr24, while the travel time approximately obeys a hybrid form tϕ ln r + ωr + ψ25,26, where ν, ϕ, ω and ψ are coefficients. The logarithmic term results from the mixture of modes of transportation27. Apparently, people move faster when traveling longer distances: we walk from classroom to office but take an airplane from US to China. Figure S2 reports the statistics related to travel times of the data set used in this paper. Although the data set is not large enough and contains some noisy points, overall speaking, the travel time t grows in a hybrid form as mentioned above, with ϕ ≈ 9 and ω ≈ 0.4.

Integrating the aforementioned terms, we obtain the displacement distribution P(r) (β/r + 1/κ)rβ exp(−r/κ), where and . When κ is large, the distribution is approximated as a power-law with an exponential cutoff. Indeed, for the real data, κ = 40 and β = 0.38, so for r < 100, that is, the term can be neglected. As shown in Fig. S3, the corresponding fitting line is very close to a power law with an exponential cutoff (but with a slightly higher power-law exponent 1.38).

A direct corollary of maximum entropy principle is that the displacement distribution should follow an exponential form if it only accounts for trips from a single mode of transportation because in that case, c r. This corollary gets supportive evidences from a number of empirical studies on disparate systems28,29,30,31,32,33 (Bazzani et al.28 observed a slight deviation from the exponential law). Fig. 4 reports empirical cumulative distributions for taxi trajectories in Beijing31, car trips in Detroit (downloaded from www.semcog.org), bus trips in Shijiazhuang (collected by the authors) and air flights in the US30. The probability density distributions are shown in Fig. S4. All distributions can be well characterized by exponential-like functions.

Figure 4: Cumulative displacement distributions for a single mode of transportation.
Figure 4

(a) 12,028,929 taxi passenger trajectories in Beijing. (b) 46,541 car trips in Detroit. (c) 783,210 bus trips in Shijiazhuang. (d) 205,534 air-flight travels in US.

Discussion

The general lessons that we learned from the present analysis could be used to refine our knowledge of human mobility patterns. The displacement distributions for aggregated data usually display power-law decay with an exponential cutoff. Meanwhile, there are examples ranging from taxi trips to air flights in which the displacement distributions are exponential. In these examples, every displacement distribution is generated by trips involving a single mode of transportation, which corresponds to a linear relation between the travel cost and distance and eventually results in an exponential displacement distribution according to the principle of maximum entropy. In a word, we believe the travel cost is one main reason resulting in the regularities in aggregated statistics. The present results suggest that the form (power law or exponential or other) of deterrence function in the gravity law for human travel34 may be sensitive to the modes of transportation under consideration.

This study warns researchers of the risk of inferring individual behavioral patterns directly from aggregated statistics. Analogously, the temporal burstiness of human activities is widely observed, and the researchers are aware of the fact that the aggregated scaling laws could either be a combination of a number of individuals, each of whom displays scaling laws similar to the population35, or the result of a mixture of diverse individuals, most of whom exhibit far different statistical patterns than the population36,37,38. In comparison, such issues are less investigated for spatial burstiness. In particular, experimental analyses on individuals has rarely been reported. Determining whether the displacement distribution of an individual follows a power-law distribution will require further data and analysis.

It is already known to the scientific community that a number of Poissonian agents with different acting rates can make up a power-law inter-event time distribution at the aggregated level36,37,38, and very recently, Proekt et al.39 showed that the aggregated scaling laws on inter-event time distribution may be resulted from different time scales. Petrovskill et al.19 have applied similar (yet different) idea in explaining the aggregated scaling laws in walking behavior. Although being mathematically and technically different, this work embodies some similar perspectives, because the different transportation modes indeed assign different scales onto space: the world becomes smaller by air flights while a city is really big by walking. Elegant analogy between temporal and spatial human behaviors will benefit the studies of each other.

Many known mechanisms underlie the scaling laws of complex systems40,41,42, including rich get richer43,44,45, good get richer46,47, merging and regeneration48, optimization49,50, Hamiltonian dynamics51, stability constraints52, and so on. The individual mobility model by Song et al.18 is a typical example embodying the rich get richer mechanism. We have implemented such model. As shown in Fig. S5, the exploration and preferential return model can well reproduce the diversity of individual mobility patterns. In addition, for this model, the Gibbs entropy of the displacement distribution at the individual level increases continuously due to the increasing number of locations as well as links connecting location pairs. However, the exploration and preferential return model does not explain why the lengths of exploration trips should follow a power law, which is a core assumption leading to the power-law-like aggregated displacement distribution. Therefore, our work has complemented Ref. 18 and other related works in two aspects: (i) providing supportive empirical observation at the individual level; (ii) providing alternative explanation on the emergence of scaling in aggregated displacement distribution. Very recently, from the analysis on mobility patterns in an online game, Szell et al.53 observed a characteristic jump length and guessed that the existence of the characteristic length may be due to the single mode of transportation. The present theory could explain their observation since a jump in such online game costs time that is proportional to the jump length.

Methods

Data description

This work was performed using a travel survey data set that contains 230 volunteers' six-week travel diaries in Frauenfeld, Switzerland20. The survey was conducted among 230 volunteers from 99 households in Frauenfeld and the surrounding areas in Canton Thurgau from August to December 2003. The volunteers reported their daily travel by filling out (paper and pencil based) self-administrated questionnaire day by day in a six-week period. Each reported trip includes the information of origin, destination and purpose. The origin and destination of a trip were geocoded by longitude and latitude. The quality of the geocoding is very high - with 60% of trips captured within 100 m of their true origins and destinations and 90% within 500 m. The purpose of trip was classified into work, shopping, education, home, leisure, business and other. The data has been cross-checked to ensure the consistency and filtered to remove outliers as well as unclear and omitted destination addresses. The final cleaned data set includes 36761 trip records. Besides, the data set also contains socio-demographic characteristics of the volunteers' personal information such as age, job and sex.

Kolmogorov-smirnov (KS) test

Given an observed distribution P(x), we firstly assume that it obeys a certain form , with a set of parameters a1, , whose values are estimated by using the maximum likelihood method21. The standard KS distance is defined as the maximal distance between the cumulative density functions of the observed data Pc(x) and the fitting curve Fc(x), namely . We independently sample a set of data points according to Fc(x), such that the number of sampled data points is the same as the number of observed data points, and then calculate the maximal distance (denoted by ) between Fc(x) and the cumulative density function of the sampled data points. The p-value is defined as the probability that . In this paper, we always implement 1000 independent runs to estimate the p-value.

Logarithmic binning

The statistical nature of sampling will lead to the increasing noise in the tails of empirical power-law-type distributions. Applying the procedure of logarithmic binning54 can smooth the noisy tail. Logarithmic binning is a procedure of averaging the data that fall in the specific bins whose size increases exponentially. For each bin the observed value are normalized by dividing by the bin width and the total number of observations (see Fig. 3).

References

  1. 1.

    , & Understanding individual human mobility patterns. Nature 453, 779–782 (2008).

  2. 2.

    , & Predictability of population displacement after the 2010 Haiti earthquake. Proc. Natl. Acad. Sci. U.S.A. 109, 11576–11581 (2012).

  3. 3.

    , & Characterizing the human mobility pattern in a large street network. Phys. Rev. E 80, 021136 (2009).

  4. 4.

    , , & Building realistic mobility models from coarse-grained traces. in Proc. of the ACM MobiSys'06 (Uppsala, Sweden), pp 177–190 (2006).

  5. 5.

    Predicting the behavior of techno-social systems. Science 325, 425–428 (2009).

  6. 6.

    Spatial networks. Phys. Rep. 499, 1–101 (2011).

  7. 7.

    & Phase transitions in contagion processes mediated by recurrent mobility patterns. Nat. Phys. 7, 581–586 (2011).

  8. 8.

    , & Natural human mobility patterns and spatial spread of infectious diseases. Phys. Rev. X 1, 011001 (2011).

  9. 9.

    & Impact of travel patterns on epidemic dynamics in heterogeneous spatial metapopulation networks. Phys. Rev. E 79, 016111 (2009).

  10. 10.

    , & Epidemic variability in hierarchical geographical networks with human activity patterns. Chaos 22, 023150 (2012).

  11. 11.

    & Embedding economies of scale concepts for hub networks design. J. Transp. Geogr. 9, 255–265 (2001).

  12. 12.

    , , , & Scaling laws between population and facility densities. Proc. Natl. Acad. Sci. U.S.A. 106, 14236–14240 (2009).

  13. 13.

    , , & Collaborative location and activity recommendations with GPS history data. in Proceedings of the 19th International Conference on World Wide Web (New York, ACM Press), pp 1029–1038 (2010).

  14. 14.

    , , & Personalised travel recommendation based on location co-occurrence. arXiv:1106.5213.

  15. 15.

    , & Exploiting place features in link prediction on location-based social networks. in Proc. of the ACM KDD'11 (New York, ACM Press), pp 1046–1054 (2011).

  16. 16.

    , & The scaling laws of human travel. Nature 439, 462–465 (2006).

  17. 17.

    , , & Origin of the scaling law in human mobility: hierarchy of traffic systems. Phys. Rev. E 83, 036117 (2011).

  18. 18.

    , , & Modelling the scaling properties of human mobility. Nat. Phys. 6, 818–823 (2010).

  19. 19.

    , & Variation in individual walking behavior creates the impression of a Lévy flight. Proc. Natl. Acad. Sci. U.S.A. 108, 8704–8707 (2011).

  20. 20.

    , & Precision of geocoded locations and network distance estimates. J. Transport. Stat. 8, 1–15 (2005).

  21. 21.

    , & Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009).

  22. 22.

    , , & Limits of predictability in human mobility. Science 327, 1018–1021 (2010).

  23. 23.

    Equilibrium and Nonequilibrium Statistical Mechanics (New York: John Wiley) (1975).

  24. 24.

    Travel networks. in Handbook of Transport Modelling, eds Hensher D. A. & Button K. J. (New York: Pergamon), pp 165–180 (2000).

  25. 25.

    , , & On the relationship between travel time and travel distance of commuters. Ann. Reg. Sci. 33, 269–287 (1999).

  26. 26.

    , & A study on tour time planning of domestic sightseeing travel itineraries. Hum. Geogr. 20, 51–56 (2005).

  27. 27.

    & Transport costs, location and the economy. in Location and Competition, eds Brakman S. & Garretsen H. (New York: Routledge), pp 32–60 (2005).

  28. 28.

    , , , & Statistical Laws in Urban Mobility from microscopic GPS data in the area of Florence. J. Stat. Mech. P05001 (2010).

  29. 29.

    , , & Structure of urban movements: polycentric activity and entangled hierarchical flows. PLoS ONE 6, e15923 (2011).

  30. 30.

    & Exploring human mobility patterns based on location information of US flights. arXiv:1104.4578v2.

  31. 31.

    , , , & The scaling of human mobility by taxis is exponential. Physica A 391, 2135–2144 (2012).

  32. 32.

    , & Towards a Statistical physics of human mobility. Int. J. Mod. Phys. C 23, 1250061 (2012).

  33. 33.

    , , , & Collective human mobility pattern from taxi trips in urban area. PLoS ONE 7, e34487 (2012).

  34. 34.

    , , & A universal model for mobility and migration patterns. Nature 484, 96–100 (2012).

  35. 35.

    The origin of bursts and heavy tails in human dynamics. Nature 435, 207–211 (2005).

  36. 36.

    , , & A Poissonian explanation for heavy tails in e-mail communication. Proc. Natl. Acad. Sci. U.S.A. 105, 18153–18158 (2008).

  37. 37.

    Conditions for the emergence of scaling in the inter-event time of uncorrelated and seasonal systems. Physica A 369, 877–883 (2006).

  38. 38.

    , , , & Evidence for a bimodal distribution in human communication. Proc. Natl. Acad. Sci. U.S.A. 107, 18803–18808 (2010).

  39. 39.

    , , & Scale invariance in the dynamics of spontaneous behavior. Proc. Natl. Acad. Sci. U.S.A. 109, 10564–10569 (2012).

  40. 40.

    A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 226–251 (2004).

  41. 41.

    Power laws, Pareto distributions and Zipf's law. Contemp. Phys. 46, 323–351 (2005).

  42. 42.

    & Re-inventing Willis. Phys. Rep. 502, 1–35 (2011).

  43. 43.

    On a class of skew distribution functions. Biometrika 42, 425–440 (1955).

  44. 44.

    & Emergence of scaling in random networks. Science 286, 509–512 (1999).

  45. 45.

    , & Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes. Sci. Rep. 3, 1082 (2013).

  46. 46.

    , & Self-organized network evolution coupled to extremal dynamics. Nat. Phys. 3, 813–817 (2007).

  47. 47.

    , , , & Emergence of Scale-Free Leadership Structure in Social Recommender Systems. PLoS ONE 6, e20648 (2011).

  48. 48.

    , , & Self organized scale-free networks from merging and regeneration. Eur. Phys. J. B 43, 369–372 (2005).

  49. 49.

    , & Scale-free networks from optimal design. Eur. Phys. Lett. 43, 369–372 (2002).

  50. 50.

    , , & Animal search strategies: a quantitative random-walk analysis. Ecology 86, 3078–3087 (2005).

  51. 51.

    & Scale-free networks from a Hamiltonian dynamics. Phys. Rev. E 68, 047103 (2003).

  52. 52.

    , , , & Emergent self-organized complex network topology out of stability constraints. Phys. Rev. Lett. 103, 108701 (2009).

  53. 53.

    , , , & Understanding mobility in a social petri dish. Sci. Rep. 2, 457 (2012).

  54. 54.

    Power law distributions in information science: Making the case for logarithmic binning. J. Am. Soc. Inf. Sci. Tec. 61, 2417–2425 (2010).

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (NNSFC) under Grant Nos. 11222543, 11205040, 11275186 and 91024026, Program for New Century Excellent Talents in University under Grant No. NCET-11-0070, and Huawei University-Enterprise Cooperation Project under Grant No. YBCB2011057.

Author information

Affiliations

  1. Web Sciences Center, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China

    • Xiao-Yong Yan
    •  & Tao Zhou
  2. Department of Transportation Engineering, Shijiazhuang Tiedao University, Shijiazhuang 050043, P. R. China

    • Xiao-Yong Yan
  3. Department of Systems Science, Beijing Normal University, Beijing 100875, P. R. China

    • Xiao-Yong Yan
  4. Department of Modern Physics, University of Science and Technology of China, Hefei 230026, P. R. China

    • Xiao-Pu Han
    •  & Bing-Hong Wang
  5. Institute of Information Economy and Alibaba Business College, Hangzhou Normal University, Hangzhou 310036, P. R. China

    • Xiao-Pu Han
  6. Microsoft Research Asia, Beijing 100080, P. R. China

    • Tao Zhou

Authors

  1. Search for Xiao-Yong Yan in:

  2. Search for Xiao-Pu Han in:

  3. Search for Bing-Hong Wang in:

  4. Search for Tao Zhou in:

Contributions

X.Y.Y., X.P.H., B.H.W. and T.Z. designed the research. X.Y.Y. performed the experiments, X.Y.Y., X.P.H. and T.Z. analysed the data, X.Y.Y. and T.Z. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Tao Zhou.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    Supporting Information

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/srep02678

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.