Abstract
In this study we analyze the travel patterns of 500,000 individuals in Cote d'Ivoire using mobile phone call data records. By measuring the uncertainties of movements using entropy, considering both the frequencies and temporal correlations of individual trajectories, we find that the theoretical maximum predictability is as high as 88%. To verify whether such a theoretical limit can be approached, we implement a series of Markov chain (MC) based models to predict the actual locations visited by each user. Results show that MC models can produce a prediction accuracy of 87% for stationary trajectories and 95% for nonstationary trajectories. Our findings indicate that human mobility is highly dependent on historical behaviors and that the maximum predictability is not only a fundamental theoretical limit for potential predictive power, but also an approachable target for actual prediction accuracy.
Introduction
Studies of mobility patterns and predictions of individual mobility trajectories are important in many research fields, such as mobile computing, epidemic modeling, traffic planning and disaster response^{1,2,3}. Realtime locations visited by individuals are typically collected through mobile devices equipped with globalpositioning system (GPS) capability, mobile phone cell towers, or wireless local area network (WLAN) access points.
Various methods have been proposed to forecast individual trajectories, including Markov chain (MC) models^{4,5}, neural networks^{6}, Bayesian networks^{7} and finite automaton^{8}. Prediction accuracy has been shown to vary according to the algorithm used and the context from which the location data come. For example, in an evaluation of next cell prediction based on more than 6000 users on Dartmouth's campuswide WiFi wireless network, it was found that the best predictor (the model) had an accuracy of about 65–72%^{9}. In another study where mobility traces of six researchers and GPSlocations of 175 individuals were used, the prediction accuracy was shown to be in the range of 70% to 95% with an model^{10,11,12}. On the other hand, in an evaluation of MC models for pedestrianmovement prediction, the prediction accuracy was as low as 2%, 45% and 74.4% for the model, hiddenMarkov model and the mixed MC model, respectively^{13}.
The above studies investigated small numbers of individuals or special populations and the results and practical feasibility of the proposed new predictive algorithms were therefore difficult to generalize to the general population. In addition, it was not clear how well these algorithms performed versus the best possible algorithm that could theoretically be constructed; i.e., what is, for the given data type, the best possible accuracy achievable and how well do the predictive algorithms perform versus such a benchmark? The highest potential accuracy of predictability, termed “maximum predictability” (Π^{max}), is defined by the entropy of information of a person's trajectory (frequency, sequence of location visits, etc.). Π^{max} can be calculated by solving a limiting case of Fano's inequality (a relation derived from calculation of the decrease in information in a noisy information channel)^{14,15,16}.
By measuring Π^{max}, Song et al showed, using a mobile phone dataset of 50,000 users in a highincome country, that there is a 93% potential predictability in user mobility, despite very large differences in travel distances^{17}. Under much more extreme conditions and in a lowincome setting, Lu et al analyzed a complete mobile phone dataset of 2.9 million anonymous subscribers after the earthquake in Haiti in 2010 and found that despite massive population movements and increased travel distances following the earthquake, the predictability of people's movements remained as high as 85%, indicating a fundamental regularity in human mobility^{1}. These findings are promising for the design and improvement of predictive algorithms. However, these studies did not show how close to the maximum potential predictability the accuracy of actual algorithms can come in practice.
In this study we aim to fill this gap in knowledge by measuring the maximum predictability and performance of actual prediction algorithms on a mobile phone data set of 500,000 users from Cote d'Ivoire (CIV), West Africa. We also give an overview of population mobility patterns during the data collection period, which took place after the 2011 civil war. We find that the maximum predictability and regularity in mobility in CIV is high, similar to what was found in studies in Haiti and Europe^{1,17}. The evaluation of practical predictive algorithms on this dataset reveals that the maximum predictability can be approached with MCbased models. Interestingly, we also show that higher order MC models do not generate improved prediction accuracy when compared to a first order MC model.
Results
The mobile phone dataset
Mobility data was provided by the telecom company Orange and derived from call detail records (CDR) from a random sample of 500,000 anonymous Orange mobile phone subscribers, active during December 1, 2011 to April 28, 2012 in CIV. The user's location was provided as the location of the subprefecture (souspréfecture in French) of the mobile phone tower through which the call was routed. CIV is composed of 19 regions, which are further divided into 255 subprefectures (237 of these subprefectures have at least one tower, see Fig. 1). The original CDR contains approximately five million users (1/4 of the total population of CIV)^{18,19}. Detailed description of the data can be found in^{20}.
The number of Orange subscriptions per person varies considerably throughout the country, as does the overall population density in CIV. For example, the region of Lagunes, which includes the economic capital Abidjan, is home to 25% of the CIV population and is the most frequently visited location for 43% percent of the mobile phone subscribers in this dataset (see Fig. 2A). The distribution of the number of location updates (calls and SMS) follows a lognormal distribution (Fig. 2B), with 81.3% having between 100 and 2000 location updates. Seventyseven percent of users had at least one location update per day during two thirds of the data collection period (Fig. 2C). Heterogeneity in visitation patterns was high. While sixteen percent of subscribers were only found in one subprefecture during the period, a few users were registered in more than 50 subprefectures (Fig. 2D).
Overview of mobility and aggregated flows
The absolute change in the number of subscribers in each region is dominated by the changes in the region of Lagunes, where Abidjan is situated (Fig. 3A). Sevenday cyclical patterns (workdayweekend cycles) are clearly visible for several regions, e.g. Lagunes and SudComoe, but other more complex trends are also evident. An irregular change in population flow was observed near the end of March and early April when the numbers of users rapidly increased in Abidjan then decreased a few days later (potentially partly related to Easter). In addition, BasSassandra, in the southwest experienced a decrease of users during large parts of the period studied here.
In relative terms (Fig. 3B), several regions showed considerable changes over the period of time analyzed, dominated by Denguele, which showed a small change in absolute terms (see Fig. 3A). As we see from Fig. 3C, both the distribution of average travel distances, and the radius of gyration, r_{g}, (see Methods for definition) illustrate a skewed decay over increasing traveling distances. While the movements of the vast majority of users were confined within an area of 10 km, a few users traveled on average as far as 100 to 300 km per day (Fig. 3D). Note that the radius of gyration is calculated from location data on the level of the subprefecture and thus excludes short movements.
Regularity and potential predictability
We now focus on the regularity of the daily observed trajectories of the users by allocating the last observed location (subprefecture) to each user's trajectory. To avoid the illusion of high predictability stemming from users with many unknown locations, or from users who never traveled to other locations, we include users who visited at least two subprefectures and were observable for more than 120 days in the period (208,288 users).
The distributions of users' random entropy (S^{rand}), temporaluncorrelated entropy (S^{unc}) and trueentropy (S^{real}) are presented in Fig. 4A (see Methods). We can see that, consistent with findings from previous studies, the entropy of visited locations is greatly reduced if we consider both the spatial and temporal correlation of the visit sequences. The median value of S^{rand} is 2.0, indicating that if we assume that individuals randomly choose a location to visit the next day, a typical individual could be found in any of 2^{2.0} ≈ 4 locations. On the other hand, if we use information contained in the frequency and sequence order of the trajectory of individuals, the uncertainty in a typical individual's whereabouts reduces to and , in less than two locations.
Not surprisingly, the reduced uncertainty leads to increased maximum predictability, as shown in Fig. 4B. If information is available only on the number of unique locations visited, L_{i}, the accuracy of any predictive algorithm cannot exceed 0.35. With the additional information on frequency and temporal correlation, the average predictability increases to <Π^{unc}> ≈ 0.84 and <Π^{max}> ≈ 0.88, respectively. Additionally, we evaluated the sensitivity of our entropy and predictability measures to the sampling rate of the data, without finding any important biases (see Supporting Information S1).
In Fig. 4C, we investigated the correlation between the radius of gyration and the average predictability, <Π>. There is a steady decrease of <Π^{rand}> and <Π^{unc}> when r_{g} increases (measured based on the centroid of each subprefecture). On the other hand, <Π^{max}> stays around 0.85 for a wide range of r_{g} ∈ [20, 300]. This finding is consistent with previous studies, revealing the independence of predictability on travel distance in human mobility^{1,17}. However, we have also examined other travel distance measurements. Increases in average travel distance () cause a slight decrease in predictability. <Π^{max}> ranging from 0.9 to 0.7 when increases from 1 to 20 km and stays around 0.63 0.68 for . However, interestingly, predictability decreases considerably with an increasing number of unique locations visited. From Fig. 4E, we can see that the average predictability <Π^{unc}> and <Π^{max}> decays almost linearly with the number of unique locations visited.
Prediction accuracy based on Markovchain models
The predictability analysis in the previous section reveals that, by combining information on frequency with temporal correlation of the trajectory, the theoretical upper bound of prediction accuracy can get as high as 0.88. However, the largest prediction accuracy that can be achieved with properly designed predictive algorithms is not given directly by this measure. In this section, we use MC(n) models to predict the location of users on each day, by considering all previous data points in the trajectory (see Methods). The accuracy of these models is presented in Fig. 5A and shows accuracies of more than 0.8 for almost all days. The accuracy of MCbased models (<γ> ≈ 0.91), MC(1) toMC(7), produce substantially higher accuracies than the estimation method based only on frequency information, i.e., MC(0) (<γ> ≈ 0.85).
There is however little difference between the performance of MCbased models of different orders. At the beginning of the period when historical information is limited, the accuracy of MC(1) is slightly higher than the other MC models, however this difference becomes very small when the historical trajectory is over 100 data points.
Another interesting finding from Fig. 5A, is that the MCbased models perform more robustly than MC(0). For example, during the later period of the data, there is a sharp decrease in the accuracy of MC(0) (from 0.88 to 0.77), while the accuracy of MCbased models shows a much smaller decrease, from 0.92 to 0.87. Irruptions of decreased accuracy from the MC(0) model indicate that people moved abnormally from their regular patterns. The sustainability of MCbased models reveals that such abnormalities can be captured partly by considering the temporal correlation of visiting sequences in the trajectories.
The increase of <γ> over the observation period is not very apparent from Fig. 5A, as <γ> is calculated based on a combination of users with long and short historical trajectories. To investigate the effect of trajectory length on the performance of algorithms, we removed, for each user, the unknown locations and calculated the average prediction accuracy for users with valid historical trajectories of the same length L_{hist}. The results (Fig. 5B) show that the accuracy of MC(0) approaches relative stability after around 15 historical data points. For a wide range of L_{hsit} ∈ [15, 120], <γ> is steady around 0.85, indicating that the visiting behavior on frequency is relatively stable over time for users with valid historical trajectories of this range. On the other hand, there is a steady increase of <γ> for the MCbased models. When the available historical trajectories contain more than 100 data points, the average prediction accuracy climbs to over 0.9.
The performance of MCbased models indicates that, while the predictability of a typical user is Π^{max} = 0.88, which gives an upper bound for the accuracy of any predictive algorithm when the trajectory is stable, the MCbased models are able to produce estimates as high as 0.91, even higher than the theoretical upper limit. Possible reasons for why the practical algorithm can produce accuracies higher than Π^{max} could be that the trajectory data contains only one data point for each day, which means that the maximum length of the trajectory can only be 150 and the movement patterns of individuals may not yet have stabilized. To investigate the effect of stability on the performance of prediction algorithms, we use the Geweke diagnostic^{21,22} to classify X_{i} into stationary and nonstationary trajectories (see Methods). In Fig. 6, we can see that there is a clear difference in the prediction accuracy of MC models between stationary and nonstationary trajectories: after 50 historical observations, the average prediction rate is about 0.95 for nonstationary trajectories and only 0.87 for stationary trajectories. This finding confirms that, given that the trajectory is stationary, the maximum predictability, Π^{max}, provides an upper bound of accuracy for any prediction algorithm. However, for nonstationary trajectories we show that prediction accuracy can greatly surpass the maximum predictability.
MCmodels considering higher orders (longer correlations of previous locations) do not necessarily improve prediction accuracy. For example, for trajectories with the same historical length, the MC(4) model always produces less precise predictions compared to other MCmodels (Fig. 5B). This finding is consistent with previous studies, in which the MC(n > 2) model was found to not bring important improvement at the cost of a significant overhead in terms of computation and space for the learning and storing of the mobility model^{9,12}. It is worth noting that a large part of the predictive power of the studied prediction algorithm is due to the fact that many individuals spent a substantial time in his/her top visited locations. For example, users who visited four distinct subprefectures, still spent almost 80% of their time in their most visited locations (Fig. 5C).
Entropy, predictability and prediction accuracy
The evaluation of predictive algorithms above reveal that, for this dataset, which is a combination of stationary and nonstationary trajectories, the maximum predictability Π^{max} can be achieved with a firstorder Markov chain model (MC(1)). In this section, we investigate whether the individual predictability, , is correlated with the accuracy in predicting all the locations when the trajectory increases from 1 to T. We measure the individual prediction accuracy (<γ_{i}>) by the proportion of accurate predictions over all days for each individual (days without any location data are excluded).
First, we check the correlation between prediction accuracy and the disorder in the trajectory, i.e., S^{real}. As we can see from Fig. 7A, <γ_{i}> is highly correlated with the trajectory's entropy; the larger the entropy, the lower the prediction accuracy. The correlation coefficient between S^{real} and <γ_{i}> is as high as −0.849, with p < 0.000. Second, we investigate the correlation between prediction accuracy and the maximum predictability, i.e., . Not surprisingly, also correlates highly with <γ_{i}>, with a correlation coefficient of 0.802, p < 0.000 (Fig. 7B).
The high correlation between predictability and prediction accuracy of the MC(1) model reveals that, as a measurement for disorder and potential predictability, S^{real} and capture the theoretical limits for the predictive analysis of human movement behaviors and provide an approachable upper bound of predictive power for this type of mobility data. More broadly, the approach used here provides an important strategy to evaluate and guide the design and improvement of mobility prediction algorithms.
Discussion
By investigating the movement of 500,000 mobile phone users during the postcivil war period in Cote d'Ivoire, we have found a potential predictability in user mobility as high as 88% in this WestAfrican, lowermiddle income setting. The finding of high predictability is consistent with two previous studies which investigated the mobility patterns of mobile phone users in very different settings, one in a highincome country with stable social conditions^{17} and one in a lowincome country following an extreme natural disaster^{1}. By applying MCbased estimate algorithms, we found that the first order MC model (MC(1)) is able to produce an average predictive accuracy of 91%, with stationary and nonstationary trajectories having a predictive accuracy of 87% and 95% respectively.
One would perhaps assume that Markov chains of second or seventh order would produce nextlocation estimates with higher accuracy, as aggregated flows based on mobile phone data frequently show weekly cycles, see e.g., Fig. 3A, and^{1,2,17}. However, our evaluations on the MC(n) models show that this information is not necessary in this setting. This could be due to the fact that many people in Cote d'Ivoire do not have a twoday weekend and that unplanned journeys are less common in resourcelimited settings, which may be true of travel in general^{23,24}. The trajectories used for prediction contain only the last observed location on each day, which makes it difficult for the time series to reach stability. As we can see, there is a big difference in the predictive power between stationary and nonstationary trajectories, implying that the diagnostics for convergence are critical for drawing conclusions from predictability analysis on human mobility. Nevertheless, we believe that the evaluation of predictive performance on a daily basis is most practical for the longterm investigation of population movements. For the purpose of this study, we have only included the Markov chain based models in the evaluation of predictive performance of algorithms. Future studies may want to compare other predictive algorithms, such as dynamic Bayesian networks (DBN)^{25}, neural networks^{6} and finite automaton^{8} and to evaluate the feasibility of predicting aggregated population movements with individualbased travel behavior models.
In summary, this paper is, to the best of our knowledge, the first attempt to investigate both the maximum predictability and how close to this value practical algorithms can come when applied on a large mobile phone data set. Our results not only show that the predictability of human mobility is high, but also show that this high predictability is achievable for daily population movement predictions. These findings indicate that human mobility behavior is far from random and that individuals' movements are highly influenced by their historical behavior. With a good understanding of individuals' travel patterns, mobility modeling and public policy decision making, such as epidemic modeling, urban planning and traffic design, may be significantly improved.
Methods
Measures of mobility
We use the average travel distance, and the radius of gyration of the trajectory, r_{g}, to measure the mobility property of individuals. Specifically, let M_{i} = {m_{1}, m_{2}, …, m_{n}} be the sequence of observed location updates for person i during the period of data collection. Then and r_{g} are defined by:
and , where m_{j} − m_{j}_{−1} is the distance between location m_{j} and m_{j}_{−1} and is the center of mass of the trajectory^{26}.
The radius of gyration is different from the average travel distance. Someone who moves in a comparatively confined space will have a small radius of gyration even though he or she covers a large distance. On the other hand, r_{g} can be larger than if someone travels with small steps but in a fixed direction or in a large circle. Note that in the dataset used here we only know the location of each individual by subprefecture, consequently, the centroid of each subprefecture is used to approximate the coordinates of individuals. Such an approximation can introduce imprecision for the measure of travel distances, but still provides useful information when comparing mobility between users, as those who traveled across many subprefectures will have larger r_{g} and than those who spent most of their time in one or two subprefectures.
Measures of entropy and predictability
We are primarily interested in the stable, longterm patterns of population mobility behavior as opposed to shortterm movements. Here we focus on entropy and predictability analysis of daytoday movements of individuals. Let X_{i} = {x_{1}, x_{2}, …, x_{T}} be the sequence of daily locations for person i during the data collection period of T days. x_{j} is the last observed location ID of person i on day j, otherwise we mark x_{j} “unknown”. The uncertainty (or disorder) of the trajectories can be measured by entropy. Larger entropy indicates greater disorder and consequently reduces the predictability of an individual's movements.
Entropy
Following notation in^{17} we measure: (i) the random entropy, , capturing the predictability of each user by assuming that the person's whereabouts are uniformly distributed among L_{i} distinct locations in X_{i}; (ii) the temporaluncorrelated entropy, , where p_{k} is the frequency at which the person visited the k^{th} location among the L_{i} distinct locations. takes into account the number of different locations visited as well as the proportion of time i spent at each location, decreasing the uncertainty of the trajectory, and; (iii) the trueentropy, , where is the probability of finding a subsequence in X_{i}, considering both spatial and temporal patterns.
Predictability
Given the entropy E for an individual i, Fano's inequality gives an upper limit for the predictability of i, i.e., the level of accuracy the best possible predictive algorithm can achieve:
where is given by
and
Let , and , since , it is true that . Comparing between these three predictability measurements provides us with the ability to investigate how the spatial distribution and temporal correlations in an individual's trajectory improve potential predictive power. Since provides the best possible predictive power (because it uses the maximum information from ) we refer to it in this paper as the “maximum predictability”.
Prediction algorithms
Predicting a user's next location using Markov chain models
To investigate how close we can get to achieving Π with actual prediction algorithms we implement several variants of Markov chain (MC) based models.
In an MCbased model, the trajectory of each individual is modeled as a Markov chain of order n, which assumes that the movement of individuals between the L_{i} locations is a process with limited memory in the sense that the future location is visited depending only on the previous n visited location, i.e., , where is a random variable representing the location for individual i at time t.
Given the previous n locations , the prediction is then determined by the transition matrix, P, choosing the destination location x^{pre}(1 ≤ pre ≤ L_{i}) which maximizes the probability:
Increases of the order n in the Markov chain do not necessarily lead to improvement in the prediction accuracy. However, to investigate the correlation of predictive powers with the length of trips to historical locations considered, we vary n from 1 to 7 (one day to one week). If predictions for a higher ordered MC(n) model did not exist (i.e., the order of the previous n locations is unique in history), the prediction from a lower ordered model, MC(n − 1), was used.
The performance of each model was evaluated by the accuracy, γ, which is the proportion of accurate predictions from all predictions made:
Users who were not active on a specific day were excluded from the prediction.
Next place prediction using historical frequency data
For comparison we implemented a simple algorithm predicting the next location based on the most visited location in the historical trajectory: , where p_{k} is defined the same as in S^{unc}. As no temporal correlation is considered in this algorithm, we refer it as MC(0).
Using the MC models, we repeatedly updated the transition matrices and the visiting frequency for each user when new locations were observed in the trajectory. We predicted for each user the most likely location s/he would visit on each day based on all the historical information, i.e., for each day t, the transition matrices and visiting frequency are constructed based on the trajectory from day 1 to day t − 1.
Geweke diagnostic
The Geweke Diagnostic^{21,22} is a test to detect failure of convergence by comparing values in the early part of a Markov chain to those in the latter part of the chain.
Let and , where 1 < n_{1} < n_{a} < n. let n_{2} = n − n_{a} + 1 and define
Then the statistic
converges to a standard normal distribution as n_{1} → ∞ given that the chain is stationary and (n_{1} + n_{2})/n < 1.
In the above equation, and denote consistent spectral density estimates at zero frequency^{21,22} for and , respectively.
Large Z_{n}scores then indicate rejection of the null hypothesis and provide evidence that the chain is nonstationary. For the purpose of this study, we first converted the values of x_{i}_{,t} into unique integers monotonically increasing from 1 and used a significant level of α = 0.05 and let n_{2}/n = 50%. A trajectory is said to be stationary only if it passes all the tests at n_{1}/n = 0.2, n_{1}/n = 0.3, n_{1}/n = 0.4 and n_{1}/n = 0.5. By the end, the proportion of trajectories that passed the Geweke test (stationary trajectories) was 49%, see also supporting figure S2.
References
Lu, X., Bengtsson, L. & Holme, P. Predictability of population displacement after the 2010 haiti earthquake. Proc Natl Acad Sci U S A 109, 11576–81 (2012).
Bengtsson, L., Lu, X., Thorson, A., Garfield, R. & von Schreeb, J. Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: A postearthquake geospatial study in haiti. Plos Medicine 8, (2011).
Kenett, D. Y. & Portugali, J. Population movement under extreme events. Proceedings of the National Academy of Sciences 109, 11472–11473 (2012).
Ross, S. M. Introduction to probability models (Academic press, 2009).
Liu, G. & Maguire Jr, G. A class of mobile motion prediction algorithms for wireless mobile computing and communication. Mobile Networks and Applications 1, 113–121 (1996).
Liou, S.C. & Lu, H.C. Applied neural network for location prediction and resources reservation scheme in wireless networks. In: International Conference on Communication Technology Proceedings, 2003. ICCT 2003, vol. 2, 958–961 (IEEE, 2003).
Akoush, S. & Sameh, A. Mobile user movement prediction using bayesian learning for neural networks. In: Proceedings of the 2007 international conference on Wireless communications and mobile computing 191–196 (ACM, 2007).
Petzold, J., Bagci, F., Trumler, W. & Ungerer, T. Global and local context prediction. (2003).
Song, L., Kotz, D., Jain, R. & He, X. Evaluating nextcell predictors with extensive wifi mobility data. Mobile Computing, IEEE Transactions on 5, 1633–1649 (2006).
Killijian, M.O., Roy, M. & Trédan, G. Beyond san francisco cabs: Building a*lity mining dataset. In: Proceedings of the Workshop on the Analysis of Mobile Phone Networks, 75–78 (2010).
Zheng, Y., Li, Q., Chen, Y., Xie, X. & Ma, W.Y. Understanding mobility based on gps data. In: Proceedings of the 10th international conference on Ubiquitous computing, 312–321 (ACM, 2008).
Gambs, S., Killijian, M.O. & del Prado Cortez, M. N. Next place prediction using mobility markov chains. In: Proceedings of the First Workshop on Measurement, Privacy and Mobility, 3 (ACM, 2012).
Asahara, A., Maruyama, K., Sato, A. & Seto, K. Pedestrianmovement prediction based on mixed markovchain model. (2011).
Kontoyiannis, I., Algoet, P. H., Suhov, Y. M. & Wyner, A. J. Nonparametric entropy estimation for stationary processes and random fields, with applications to english text. Information Theory, IEEE Transactions on 44, 1319–1327 (1998).
Fano, R. M. Transmission of information: A statistical theory of communications. American Journal of Physics 29, 793–794 (1961).
Brabazon, A. & O'Neill, M. Natural computing in computational finance, vol. 1, (Springer, 2008).
Song, C. M., Qu, Z. H., Blumm, N. & Barabasi, A. L. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).
UN OCHA Côte d'Ivoire and Le Comité National de Télédétection et d'Information Gégraphique (CNTIG). Common and fundamental operational datasets registry. (2011). http://cod.humanitarianresponse.info/fr/countryregion/c%C3%B4tedivoire [accessed Febuary 13, 2013].
Bank, T. W. Population total in cote d'ivoire. (2011). http://data.worldbank.org/country/cotedivoire [accessed Febuary 13, 2013].
Blondel, V. D. et al. Data for development: the d4d challenge on mobile phone data. arXiv:1210.0137v2 (2013).
Cowles, M. K. & Carlin, B. P. Markov chain monte carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association 91, 883–904 (1996).
Geweke, J. Evaluating the accuracy of samplingbased approaches to the calculation of posterior moments. In: IN BAYESIAN STATISTICS 169–193 (Oxford: Oxford University Press, 1991).
Rubio, A., FriasMartinez, V., FriasMartinez, E. & Oliver, N. Human mobility in advanced and developing economies: A comparative analysis. In: AAAI Spring Symposia Artificial Intelligence for Development, AID, Stanford, USA (2010).
Wesolowski, A., Eagle, N., Noor, A. M., Snow, R. W. & Buckee, C. O. The impact of biases in mobile phone ownership on estimates of human mobility. Journal of the Royal Society Interface 10, (2013).
Eagle, N., Clauset, A. & Quinn, J. A. Location segmentation, inference and prediction for anticipatory computing. In: AAAI Spring Symposium: Technosocial Predictive Analytics, 20–25 (2009).
Gonzalez, M. C., Hidalgo, C. A. & Barabasi, A. L. Understanding individual human mobility patterns. Nature 453, 779–782 (2008).
Tatem, A. & Linard, C. High spatial resolution data on persons per grid square in cote d'ivoire. (2010). http://www.afripop.org [accessed October 3, 2013].
Acknowledgements
The authors would like to thank the operator France TelecomOrange and the “Data for Development” committee for sharing the mobile phone dataset and organizing the D4D challenge. This paper forms part of the output of the Human Mobility Mapping Project (www.thummp.org) and the AfriPop population mapping project (www.afripop.org). AJT acknowledges funding support from the RAPIDD program of the Science and Technology Directorate, Department of Homeland Security and the Fogarty International Center, National Institutes of Health and is also supported by grants from NIH/NIAID (U19AI089674) and the Bill and Melinda Gates Foundation (#49446 and #1032350). NB acknowledges funding from the Branco Weiss  Society in Science.
Author information
Authors and Affiliations
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Supplementary Information
Supplementary Information
Rights and permissions
This work is licensed under a Creative Commons AttributionNonCommercialNoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/byncnd/3.0/
About this article
Cite this article
Lu, X., Wetter, E., Bharti, N. et al. Approaching the Limit of Predictability in Human Mobility. Sci Rep 3, 2923 (2013). https://doi.org/10.1038/srep02923
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep02923
Further reading

Scale, context, and heterogeneity: the complexity of the social space
Scientific Reports (2022)

The impact of human mobility data scales and processing on movement predictability
Scientific Reports (2021)

A systematic framework of predicting customer revisit with instore sensors
Knowledge and Information Systems (2020)

From networks to optimal higherorder models of complex systems
Nature Physics (2019)

Optimizing utilization in cellular radio networks using mobility data
Optimization and Engineering (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.