## Abstract

Understanding the patterns of mobility of individuals is crucial for a number of reasons, from city planning to disaster management. There are two common ways of quantifying the amount of travel between locations: by direct observations that often involve privacy issues, e.g., tracking mobile phone locations, or by estimations from models. Typically, such models build on accurate knowledge of the population size at each location. However, when this information is not readily available, their applicability is rather limited. As mobile phones are ubiquitous, our aim is to investigate if mobility patterns can be inferred from aggregated mobile phone call data alone. Using data released by Orange for Ivory Coast, we show that human mobility is well predicted by a simple model based on the frequency of mobile phone calls between two locations and their geographical distance. We argue that the strength of the model comes from directly incorporating the social dimension of mobility. Furthermore, as only aggregated call data is required, the model helps to avoid potential privacy problems.

## Introduction

People travel and move for a variety of reasons, including social, economic and political factors. While individuals may follow simple, recurrent patterns of movement, e.g., daily commuting, a more complex picture emerges when all trajectories of a population are assembled together^{1}. Understanding the principles governing individual and collective movement is important for a number of reasons: for planning urban design^{2}, for forecasting and avoiding traffic congestion^{3}, for mitigating infectious disease^{4,5,6} and for contingency planning in extreme situations caused by disasters^{7,8}. However, accurately determining the movement patterns in a population is cumbersome and costly and involves privacy issues.

There are two ways of inferring the mobility patterns in a population: by direct measurement or by models that predict population movement based on other observed data. Regarding the former, tracking the movement of individuals using location data from mobile phones^{9,10,11} has emerged as a powerful alternative to traditional methods such as traffic surveys^{12}. In this case, the data set comes from the billing systems of mobile phone operators, where the closest tower of each phone is recorded when a mobile phone is used. The resolution problems caused by this are compensated by the large quantity and high quality of data^{13,14}. However, there are drawbacks to this approach: tracking the locations of individuals may be seen as a threat to privacy even when the data is properly anonymised^{15}.

The alternative approach to direct measurement is to use models that predict the average population behaviour from (publicly) available information, such as census and population data. Perhaps the most famous example is the gravity model^{16,17,18} that has been used to predict the intensity of a number of human interactions, including population movement^{19,20,21} and mobile phone calls between cities^{22}. In the gravity model, the intensity of interactions between two locations (e.g., cities) is determined by their populations and distance (with proper scaling exponents). Recently, it has been shown that a parameter-free model, the radiation model^{23}, is able to predict mobility patterns with improved accuracy; this model requires geospatial information on population size as an input.

The applicability of the above-mentioned models is constrained by the availability of accurate population information. This may become a problem e.g. for developing countries, where census data may be incomplete. However, mobile phones are ubiquitous almost everywhere and one might expect that mobile phone calls reflect the social dimension of mobility – the amount of social ties between geospatial locations can be expected to influence travel patterns. Therefore, the aim of this paper is to predict mobility patterns from mobile phone call data alone and examine models that would be applicable in a setting where accurate, up-to-date population information is not available. Furthermore, we focus on models that only require aggregated call data, without needing to track individual users. This has the obvious benefit of mitigating privacy-related issues; additionally, the volume of required input data is smaller and the aggregation can be easily done by the mobile operator that owns the source data.

Our modelling and analysis is purely based on the Ivory Coast mobile telephone data set^{24}, originally released by Orange for the Data for Development Challenge. This data set includes information on mobile phone calls aggregated at the tower level during 140 days, used as inputs for the models and data on the trajectories of randomly chosen individuals, used for developing the models and testing their accuracy. There is no accurate, up-to-date geospatial population information for Ivory Coast; the last census was conducted in 1998 and there is no data available on mobility or migration within the country. In contrast, the telephone system in Ivory Coast is well-developed by African standards with mobile phone penetration above 83%^{25}.

This paper is constructed as follows: first, we examine gravity laws for average mobility and call frequency between locations. We then proceed to show that mobility between two locations can be directly estimated from the number of calls between the locations and their distance. This holds at two levels of coarse-graining: between tower locations in a major city and between cities. Finally, we study the accuracy of predictions for individual pairs of locations, beyond averages and show that the number of calls between locations appears to be a good predictor of the frequency of travel between them. For reference, we also study variants of existing mobility models (the gravity and radiation models) where location-specific call frequencies are used as inputs instead of population data; despite applying these models beyond their intended range, they provide fairly good predictions on average.

## Results

### Data set and coarse-graining

The data set comes in two parts: (i) the number of calls between 1231 Orange towers in Ivory Coast for 5 months and (ii) ten data sets on two-week individual trajectories of 50,000 randomly chosen users. From the trajectories, we aggregated the mobility *m*_{ij} between locations *i* and *j* by counting direct movements along the trajectories (see Methods for further details).

As it is reasonable to assume that communication and mobility patterns are in general different for short and long distances, we aggregated the data at two levels: (i) tower level for intra-city behaviour and (ii) city level for inter-city behaviour. The intra-city analysis consist of 5.1 million movements and 109 million calls between all 298 towers located inside Abidjan, the largest city of Ivory Coast, during 140 days. This comprises 31% of all calls and 50% of all movements in the country. In this analysis the geographical unit – referred to as “location” in the following – is the area covered by a single tower. To analyse inter-city behaviour, we aggregated towers that lie within a city boundary and consider calls and mobility between cities. The resulting data contains 143 cities with 63 million calls and 374 thousand movements between them during 140 days. At both levels of analysis, we determine the number of calls, movements and the geographical distance between every pair of locations (towers, cities). See Methods for further details.

### Gravity laws: dependence of mobility and communication intensity on distance

We begin by investigating whether the mobility and communication intensities between two locations follow the gravity law on average. In its general form, the gravity law states that

where *x*_{ij} is the intensity of interaction, e.g., calls, mobility, trade, between locations *i* and *j* associated with populations of sizes *N*_{i} and *N*_{j}, separated by a distance *d*_{ij}^{16,17,18}. The exponent α governs the distance dependence. Note that in the most general form of the gravity law, *N*_{i} and *N*_{j} are also associated with an exponent; here for simplicity we assume a linear dependence. For our data, we study the intensities of mobility *m*_{ij} and communication *c*_{ij} between locations *i* and *j*. These are defined as the average number of weekly movements and calls between them, respectively. As a proxy of the population *N*_{i}, we take the total number of weekly calls *s*_{i} made and received at location *i*.

The variation of the scaled mobility intensity, *m*_{ij}/*s*_{i}*s*_{j}, with respect to the distance *d*_{ij} is shown in Fig. 1 for the tower and city levels of coarse-graining (panels A and B, respectively). In both cases, the gravity law holds on average and

where γ ≈ 2.14 for the intra-city level and γ ≈ 2.54 for the inter-city level. Panels C and D display a similar plot for the scaled communication intensity that is also seen on average to follow the gravity law:

where the distance exponents are δ ≈ 1.20 for the intra-city level and δ ≈ 1.48 for the inter-city level. It is worth noting that both exponents γ and δ are smaller for the intra-city level, indicating differences in communication and travel patterns within and between cities: within a city, the spatial distance appears to play a less important role than it does between cities.

The two gravity laws discussed above suggest that the following relationship might also hold:

where β = γ − δ. This is indeed the case, as seen in Fig. 1 (E,F) where 〈*m*_{ij}/*c*_{ij}〉 follows a power-law dependence on *d*_{ij}. For both intra- and inter-city levels, we find the exponent β ≈ γ − δ (see Table I). These results suggest that there are two possible ways of inferring the intensity of mobility between locations *i* and *j* from call data: using the distance and either (i) the total call numbers at both locations *s*_{i} and *s*_{j} (Eq. 2), or (ii) the total number of calls between the locations *c*_{ij} (Eq. 4). The prediction accuracy of these two models will be assessed in in the section “Prediction accuracy” below.

It is worth noting that both for intra- and inter-city levels, the exponent β ≈ 1. This does not directly result from Eqs. (2) and (3). One possible argument for the observed value of β is as follows: the cost of a single trip, measured in e.g. time or money, between two towers/cities *i* and *j* can be assumed to depend linearly on their distance, *d*_{ij}. This means that the total cost of all movements between *i* and *j* is proportional to *m*_{ij}*d*_{ij}. However, the cost of communication is independent of distance. If one further assumes that the total cost of movement is balanced by the total benefit brought by social ties, linearly reflected in *c*_{ij}, we have *m*_{ij}*d*_{ij} ~ *c*_{ij} and thus the value of exponent β = 1. In this interpretation, the communication exponent δ is directly related to a decrease in the number of social ties as function of distance, whereas γ captures a combination of cost associated with travel and the decrease in the number of social ties.

### Models for estimating mobility based on call data

The results of the previous section indicate that on average, the mobility intensity *m*_{ij} between two locations *i* and *j* can be estimated using the *gravity model*

where is a normalization constant obtained by equating the total numbers of expected and observed movements, i.e., . This model takes the communication intensities *s*_{i} and *s*_{j} at both locations as inputs in addition to the distance *d*_{ij}. As an alternative we propose the *communication model*

based on the communication intensity *c*_{ij} between the locations. The normalization constant is obtained as before. The values of the exponents γ and β are taken from Table I.

For comparison, we also study a modified version of the *radiation model*^{23}, originally designed to predict mobility between locations *i* and *j* with the help of data on population density in the surrounding area. Again, we modify the model such that only call and distance data is required as input. To this end, we assume that the number of calls in a given location is an unbiased estimate of population density, similarly to the gravity model. Note that this assumption may not necessarily hold, since mobile phone penetration may correlate with socioeconomic factors. Further, we assume that the number of trips that begin (end) at location *i* (*j*) is proportional to *s*_{i} (*s*_{j}). Then, the radiation model formula can be rewritten as

Here *s*_{ij} denotes the total number of calls made within a circle of radius *d*_{ij} centred at *i*, excluding locations *i* and *j* and is a normalization constant.

### Prediction accuracy

To assess the actual predictive power of the models beyond averages, we compare the actual mobility intensity *m*_{ij}, obtained from the trajectory data set, with the estimates given by the models for each specific pair of locations *i* and *j*. This comparison for the communication model, the gravity model and the radiation model is shown in Fig. 2. The gray dots correspond to predicted versus actual mobility for each pair of locations and the boxes (whiskers) correspond to the region between 25th and 75th (9th and 91st) percentiles.

It is clear from the figure that all models give on average reasonable predictions. However, the gravity and radiation models display higher levels of variance between the predicted and actual mobility intensities. In particular, the prediction accuracy of the gravity model is relatively poor for the inter-city mobility and the radiation model performs the worst for the intra-city mobility. The latter is not surprising, as the radiation model was originally not designed for predicting short-range travel patterns within cities. Further, the original radiation model requires accurate geospatial population information and simply equating population size within an area with the number of calls can be expected to give rise to errors.

The level of observed variance implies that in addition to comparing averages, it is important to compare the expected and observed mobility between individual pairs of locations. As the first step, we determine the Spearman correlation coefficients between *m*_{ij} and . Table II shows that the correlation is higher for the communication model than for the gravity and radiation models for both levels of coarse-graining (intra-city, inter-city). In general, in terms of the Spearman coefficient, predictions of all models are more accurate for intra-city mobility than for inter-city mobility.

Finally, we consider the differences between the observed and predicted mobilities by measuring their relative deviations. For all the three models, we define the relative deviations between the observed *m*_{ij} and predicted as

where δ_{ij} takes values between −1 and 1. A deviation of δ_{ij} = 0 implies exact prediction by the model for the pair of locations *i* and *j*, whereas negative (positive) values indicate under- (over-) estimations. We only determine δ_{ij} for those pairs of of *i* and *j* for which *m*_{ij} ≠ 0.

The probability distributions shown in Fig. 3 confirm the above finding that out of the studied three models for inferring mobility from call data, the communication model has the highest accuracy of prediction. The distribution is well centred around zero, whereas especially for inter-city mobility the distributions and show a bias towards under-estimation. In more detail, for intra-city mobility, the fractions of location pairs with deviations δ ∈ [−0.25, 0.25] are 13% for the radiation model, 42% for the gravity model and 51% for the communication model. For inter-city mobility, the corresponding fractions are 20%, 17% and 33%. Note that for the gravity model, in spite of the fact that the average 〈*m*_{ij}/(*s*_{i}*s*_{j})〉 follows a (Fig. 1A,B), there is still a significant amount of under-estimation. This indicates that there is a broad distribution of the values of 〈*m*_{ij}/(*s*_{i}*s*_{j})〉 for a given distance and the average value is not always a good estimator.

## Discussion and conclusion

The goal of this paper has been to investigate simple models that predict the intensities of mobility between two locations on the basis of mobile phone call data and their geospatial distance. The motivation behind this is to provide ways of predicting mobility in situations where accurate information of population size at each location is not available; furthermore, the focus is on aggregated call data, mitigating the need to track movement patterns of individual phone users. Our study is based on call and mobility data released by Orange for Ivory Coast; note that it would be important to verify the findings with data from other countries.

We have tested three models that only take aggregated call data and geospatial information as inputs: the well-known gravity model, the communication model based on the number of calls between two locations and a modified version of the radiation model. While all models on average capture the real mobility patterns derived from call data with location information, a more detailed analysis of the prediction accuracy at the level of individual locations reveals that the communication model is the most accurate out of the three tested models in this setting.

Note that the gravity and radiation models were originally designed to use geospatial population information as input parameters. Since our aim has been to study mobility models in a setting where such information is not available, we have simply taken the number of calls at a given location as a proxy of the population size. Therefore we do not claim that the communication model would outperform other models in a situation where they could be applied as their designers intended. Also note that our modeling target – the mobility pattern – is also derived from mobile phone records and geospatial biases in mobile phone usage might influence the results. Hence, it would be useful to verify the accuracy of the communication model for a case where there are alternative sources of mobility information.

The likely reason why the communication model works well is that it directly incorporates geospatial information on social ties and human relationships. It has been observed earlier that individuals tend to travel to locations where they have social bonds^{8}; furthermore, once under way, it is reasonable to assume that people make calls back home. Because of this, the aggregated intensity of communication between two locations should contain information on the mobility patterns as well. Then, in the first approximation one might assume that the frequency of movement between two locations is directly proportional to the intensity of communication. Further, the simplest way to incorporate the fact that larger distances imply larger travel costs (in terms of time or money) is to assume that mobility is inversely proportional to distance. These two components directly yield the communication model: *m*_{ij} ∝ *c*_{ij}/*d*_{ij}.

It is worth noting that in general, in gravity laws of human interaction, the distance dependence is associated with some exponent α. This is also seen in our analysis of the gravity laws for mobility and communication intensity, where the exponents were seen to depend on the level of coarse-graining, i.e., intra-city or inter-city. However, for both levels, the inverse distance dependence of the communication model is approximately linear, i.e., the exponent equals one. This suggests universality and calls for analysis of similar data sets from different countries.

## Methods

### Communication and mobility data

The data set^{24} consists of 2.5 million call detail records of customers for a single provider (Orange) in Ivory Coast between December 1st, 2011 and April 28th, 2012. The communication data used in this paper contains the number of calls as well as their aggregated duration between all pairs of 1231 towers, i.e., mobile base stations. The geographical locations of the towers were also provided. The temporal resolution of the data set is one hour.

The mobility sample consists of ten data sets of trajectories of individual users, each for 50,000 randomly chosen users. Each trajectory corresponds to the subscribers' call locations during a two-week period. The locations were recorded every time a call was made and correspond to the position of the tower that transmitted the call. The data sets represent consecutive two-week periods, beginning in December 5, 2011.

### Determining city boundaries

As the locations of the cell-towers were provided, we used reverse geocoding^{26} to determine the city in which the tower is located. The mean longitude and latitude of all towers within a city defines the centre of the city. This location was used to calculate the inter-city distances. Out of the 1231 mobile phone towers, 686 are located within city boundaries (with 298 of them in the largest city, Abidjan). The total number of cities with at least a single tower is 143.

### Determining direct movements

Given the individual trajectories of users, a variety of methods have been developed to extract different aspects of human mobility^{13}. Here, we consider *direct movements* that correspond to any consecutive changes in the location of a user. Formally, direct movements are defined as follows: if the user made a call from location *i* at some time *t* and *j* is the location of the next call at *t*′ > *t*, there is a direct movement from *i* to *j* if *j* ≠ *i*. By aggregating this information for all users we determine, the total number of direct movements between all pairs of locations. The locations can correspond either to towers (intra-city analysis) or to cities (inter-city analysis). Note that for inter-city analysis, only towers located within city boundaries are considered. Thus, all calls and direct movements to locations between cities are ignored.

### Data filtering

Users may be located in areas covered by several towers. In this case, the calls made by users at the same location can be handled by different neighbouring towers. This phenomena of switching of mobile phone calls between towers is called *handover* and it may give rise to artefacts in mobility and communication. For instance, let us consider an *immobile* user located in the boundary area covered by two towers *i* and *j*. If one of the calls of this user was served by tower *i* and the subsequent call by tower *j*, the data will indicate movement of the user from tower *i* to tower *j*. Similarly, the number of calls between neighbouring towers might also get biased. To get rid of this artefact, we excluded all pairs of neighbouring towers from our analysis. As the towers are heterogeneously distributed (higher concentration in densely populated areas and lower concentration in rural zones), neighbouring towers were identified by a distance-independent approach. To do this, we first computed the Voronoi diagram around each tower. The towers having a common edge in their Voronoi cells are defined as the neighbouring towers. We also excluded the communication and mobility between the towers that are located within 1 meter from each other (e.g. two base stations serving a busy area). Further, only pairs of locations with more than one call per day (on average) were considered.

## References

Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel. Nature 439, 462–465 (2006).

Hall, P. Cities of tomorrow: an intellectual history of urban planning and design in the Twentieth century (Blackwell, Massachusetts, 2002).

Helbing, D. Traffic and related self-driven many-particle systems. Rev. Mod. Phys. 73, 1067–1141 (2001).

Balcan, D. et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc. Natl. Acad. Sci. U.S.A. 106, 21484–21489 (2009).

Wesolowski, A. et al. Quantifying the impact of human mobility on malaria. Science 338, 267–270 (2012).

Dalziel, B. D., Pourbohloul, B. & Ellner, S. P. Human mobility patterns predict divergent epidemic dynamics among cities. Proc. Natl. Acad. Sci. U.S.A. 280 (2013).

Helbing, D., Farkas, I. & Vicsek, T. Simulating dynamical features of escape panic. Nature 407, 487–490 (2000).

Lu, X., Bengtsson, L. & Holme, P. Predictability of population displacement after the 2010 haiti earthquake. Proc. Natl. Acad. Sci. U.S.A. 109, 11576–11581 (2012).

Gonzalez, M. C., Hidalgo, C. A. & Barabasi, A.-L. Understanding individual human mobility patterns. Nature 453, 779–782 (2008).

Song, C., Qu, Z., Blumm, N. & Barabási, A.-L. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).

Jo, H.-H., Karsai, M., Karikoski, J. & Kaski, K. Spatiotemporal correlations of handset-based service usages. EPJ Data Sci. 1, 10 (2012).

Treiterer, J. Investigation of traffic dynamics by aerial photogrammetry techniques. Tech. Rep. Ohio Department of Transportation EES 278 (1975).

Calabrese, F., Di Lorenzo, G., Liu, L. & Ratti, C. Estimating origin-destination flows using mobile phone location data. Pervasive Computing, IEEE 10, 36–44 (2011).

Tizzoni, M. et al. On the use of human mobility proxy for the modeling of epidemics.

*arXiv,1309.7272*(2013).Butler, D. Data sharing threatens privacy. Nature 449, 644 (2007).

Carey, H. C. Principles of social science, vol. 3 (JB Lippincott & Company, 1867).

Carrothers, G. A. An historical bedew of the gravity and potential concepts of human interaction. J. Am. Inst. Plan. 22, 94–102 (1956).

Anderson, J. E. The gravity model. Annu. Rev. Econ. 3, 133–160 (2011).

Barthélemy, M. Spatial networks. Phys. Rep. 499, 1–101 (2010).

Jung, W.-S., Wang, F. & Stanley, H. E. Gravity model in the korean highway. Europhys. Lett. 81, 48005 (2008).

Thiemann, C., Theis, F., Grady, D., Brune, R. & Brockmann, D. The structure of borders in a small world. PLoS ONE 5, e15422 (2010).

Krings, G., Calabrese, F., Ratti, C. & Blondel, V. D. Urban gravity: a model for inter-city telecommunication flows. J. Stat. Mech. L07003 (2009).

Simini, F., González, M. C., Maritan, A. & Barabási, A.-L. A universal model for mobility and migration patterns. Nature 484, 96–100 (2012).

Blondel, V. D. et al. Data for development: the d4d challenge on mobile phone data.

*arXiv, 1210.0137*(2012).Cote d ivoire (ivory coast) - telecoms, mobile and broadband - market insights, statistics and forecasts. http://www.budde.com.au/Research/Cote-d-Ivoire-Ivory-Coast-Telecoms-Mobile-and-Broadband-Market-Insights-Statistics-and-Forecasts.html (2014) Date of access: 2014-07-10.

Reverse geocoding. https://developers.google.com/maps/documentation/javascript/examples/geocoding-reverse (2013) Date of access: 2013-01-05.

## Acknowledgements

We thank the operator France Telecom-Orange and the “Data for Development” committee for sharing the mobile phone dataset and organizing the D4D challenge. We acknowledge the support by the Academy of Finland, project no. 260427 (JS, RKP) and Aalto University postdoctoral program (HJ). VP was supported by TEKES (FiDiPro). MM was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia under project no. ON171017. We also acknowledge the computational resources provided by Aalto Science-IT project.

## Author information

### Affiliations

### Contributions

V.P., M.M., H.J., J.S. and R.K.P. designed the research and participated in the writing of the manuscript. V.P., M.M., H.J. and R.K.P. analysed the data and performed the research.

## Ethics declarations

### Competing interests

The authors declare no competing financial interests.

## Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

## About this article

### Cite this article

Palchykov, V., Mitrović, M., Jo, HH. *et al.* Inferring human mobility using communication patterns.
*Sci Rep* **4, **6174 (2014). https://doi.org/10.1038/srep06174

Received:

Accepted:

Published:

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.