The availability of massive digital traces of human whereabouts has offered a series of novel insights on the quantitative patterns characterizing human mobility. In particular, numerous recent studies have lead to an unexpected consensus: the considerable variability in the characteristic travelled distance of individuals coexists with a high degree of predictability of their future locations. Here we shed light on this surprising coexistence by systematically investigating the impact of recurrent mobility on the characteristic distance travelled by individuals. Using both mobile phone and GPS data, we discover the existence of two distinct classes of individuals: returners and explorers. As existing models of human mobility cannot explain the existence of these two classes, we develop more realistic models able to capture the empirical findings. Finally, we show that returners and explorers play a distinct quantifiable role in spreading phenomena and that a correlation exists between their mobility patterns and social interactions.
The availability of massive digital traces of human whereabouts has offered a series of novel insights on the quantitative patterns characterizing human mobility. Indeed, satellite-enabled global positioning systems (GPS) and mobile phone networks allow for sensing and collecting society-wide proxies of human mobility, like the GPS trajectories from vehicles and call detail records (CDR) from mobile phones. This broad social microscope has attracted scientists from diverse disciplines, from physics and network science1,2,3,4 to data mining5,6,7,8, and has fuelled advances from public health9,10,11,12,13,14 to transportation engineering15,16,17, urban planning18,19,20,21, official statistics22,23 and the design of smart cities24,25,26,27. All these studies document a stunning heterogeneity of human travel patterns that coexists with a high degree of predictability28,29: individuals exhibit a broad spectrum of mobility ranges while repeating daily schedules dictated by routine. Here we show that this seemingly conflicting coexistence of heterogeneity and predictability can be understood by quantifying the impact of recurring movements on mobility. To be specific, we analyse mobile call records and GPS tracks of private vehicles, allowing us to compare the overall mobility of an individual with her recurrent, or systematic, mobility. Two distinct mobility profiles emerge in both data sets: returners and explorers.
The characteristic distance travelled by returners, estimated by their radius of gyration2,6, is dominated by their recurrent movement between a few preferred locations. In contrast, recurrent mobility has only a vanishing contribution to the overall mobility of explorers, who have a tendency to wander between a larger number of different locations. We find that these two profiles are well-separated: individuals persistently belong to one or the other of these two classes. We show that current models of human mobility4 cannot account for these two classes of individuals and propose an improved model that can reproduce the mobility patterns of returners and explorers. Finally, we demonstrate that returners and explorers play different roles in spreading processes and that a strong correlation exists between the mobility behaviour of individuals and their social interactions.
Data sets and measures
Our first data source is an anonymized 3-month-long Global System for Mobile Communications (GSM) record collected by a European carrier for billing and operational purposes. It consists of CDR containing the calls of 67,000 individuals, selected from ∼3 million users provided that they visit more than 2 locations during the observational period and that their average call frequency f is ≥0.5 h−1 (see Supplementary Table 1, Supplementary Note 1). We reconstruct a user’s movements based on the time-ordered list of cell phone towers from which a user made her calls2. Our second data source is a GPS data set that stores information about the trips of∼46,000 vehicles tracked during 1 month (May 2011), which passed through a 250 × 250 km square in central Italy. The visualization of the recorded trajectories demonstrates the complexity of explored mobility patterns (Fig. 1). We assign each origin and destination point of the obtained sub-trajectories to the corresponding Italian census cell, using information provided by the Italian National Institute of Statistics (ISTAT) (see Supplementary Table 2, Supplementary Note 2). We describe the movements of a vehicle by the time-ordered list of census cells where the vehicle stopped.
to characterize the typical distance travelled by an individual. Here L is the set of locations visited by the individual, ri is a two-dimensional vector describing the geographic coordinates of location i; ni is the visitation frequency or the total time spent by the individual in location i; is the total number of visits or time spent, and rcm is the center of mass of the individual.
The most frequented location L1 is the place where an individual is found with the highest probability when stationary, most likely her home. In general, the importance of each location Lk to an individual is defined by its rank, where Lk is the k-th most frequented location (Supplementary Note 3, Supplementary Fig. 1).
Returners and explorers
To understand how the k-th most frequented locations of an individual determine the characteristic distance travelled by her, we define the k-radius of gyration .
as the radius of gyration computed over the k-th most frequented locations L1,…,Lk; is the centre of mass computed on the k-th most frequented locations (Supplementary Note 7, Supplementary Figs 8 and 9); Nk is the sum of the weights assigned to the k-th most frequented locations ( if k≥N). Thus, represents the mobility range restricted to the k-th most frequented locations. For example, if an individual’s , then her characteristic travelled distance is dominated by the two most frequented locations. Conversely, if the is much smaller than the total rg the two most frequented locations do not offer an accurate characterization of the individual’s travel pattern and we need to consider more locations.
To investigate the role of the k-th most frequented locations for an individual’s mobility pattern, we compare the probability distributions of total rg and for k=2,…,10 for the GSM and the GPS data (Fig. 2). All curves are long-tailed, indicating that most individuals cover small distances, but a few travel regularly over hundreds of kilometers (heterogeneity). We fit the distributions using the truncated power law2,6 (Fig. 2), finding two significant differences. First, the exponent α of the distribution of k-radii is significantly higher than the exponent of the total rg (see Table 1). Second, the exponential cutoff parameter rcut is larger for small k (see Table 1). Obviously, as k increases the curve approaches the total rg distribution.
The correlation between total radius and k-radius of gyration allows us to quantify the degree of similarity between overall and recurrent mobility. Figure 3 compares total rg and of each individual for k=2, 4, 8, indicating that the population splits into two distinct classes. The data points concentrated around the diagonal correspond to individuals whose total rg is comparable to , indicating that their characteristic travelled distance is dominated by their k-th most frequented locations. We call them k-returners, as their mobility range is well-approximated by their k-th most frequented locations. The points concentrated around the abscissa correspond to individuals whose is considerably smaller than total rg, indicating that we cannot reduce their mobility to k locations; we call them k-explorers. For example, the characteristic travelled distance of a two-returner is mainly determined by the two most frequented locations, typically corresponding to her home and work. In contrast, a two-explorer travels recurrently between many different locations.
The separation between the two classes is especially clear for high radii of gyration, as for high total rg we find very few points between the diagonal and the abscissa in Fig. 3. Yet, as the insets show, the split into the two classes is valid for smaller total rg as well. The number of k-returners increases with k (Supplementary Fig. 3), and when k equals the total number of visited locations each individual becomes a returner. Note that while explorers gradually become returners as k increases, the opposite process is extremely rare (see Supplementary Note 8, Supplementary Fig. 10). The partition of individuals into returners or explorers observed in both the GSM and the GPS data is not due to confounding variables like the heterogeneity of the number of calls or the demography of the municipality of residence (see Supplementary Note 6, Supplementary Fig. 7).
We develop three algorithms to split the population into k-returners and k-explorers: the bisector method classifies an individual as a k-returner if or a k-explorer otherwise; a support vector machine classifier and the expectation-maximization clustering algorithm30 extract the two patterns from the population (see Supplementary Note 4, Supplementary Fig. 2). The three methods produce similar results, indicating that the two classes are clearly separated and well-defined. Consequently, in the following we use the simpler bisector method to split the population into k-returners and k-explorers.
The ratio measures the impact of an individual’s recurrent mobility on her overall mobility: the higher the ratio the higher is the weight of the top k locations in the trajectories of an individual. Figure 4 shows the probability distribution of the sk ratio for different k. We observe two peaks: the peak located at sk=0 corresponds to k-explorers, whose k-radius is significantly smaller than the total rg; the peak at sk=1 corresponds to the k-returners, individuals whose is very similar to the total rg. Note that only for a few individuals sk>1 (that is ), suggesting that for the great majority of the individuals the k-th most frequented locations are on average closer to the centre of mass than their remaining less frequented locations (see Supplementary Note 9, Supplementary Fig. 11). By increasing k, the k-explorers gradually become k-returners, causing the explorers and returners peaks to decrease and increase, respectively. The population reaches a balance of k-returners and k-explorers for k=4 for GSM. In the GPS data, regardless of k, we always have more k-returners than k-explorers. A possible reason is that GPS data only contains trips made by private vehicles, hence missing long distance trip locations less frequented by a particular individual, reached by train or plane. These trips increase the total rg without affecting the . Neglecting these trips results in a lower estimate of an individual’s total rg, increasing the chance to classify her as a returner.
Returners and explorers are also characterized by a different spatial distribution of the visited locations. Figure 5 shows some representative examples of individual mobility networks3,31 of two-returners and two-explorers with different total rg. For both profiles, the visited locations tend to group in dense clusters with few outliers (see Supplementary Note 10, Supplementary Fig. 12). For two-returners the distance between the two most frequented locations is proportional to the total rg; in contrast, for two-explorers the distance between the two most frequented locations is much smaller than the total rg, whose magnitude is mostly determined by less frequented locations far from the centre of mass. Indeed, the distance between the two most frequented locations grows with total rg more rapidly for returners than explorers (see Supplementary Note 5, Supplementary Fig. 4), and while the locations visited by two-returners are clustered around their two most frequented locations those visited by two-explorers are more spread out (see Supplementary Note 5, Supplementary Figs 5 and 6). The higher the total radius of gyration, the more obvious is the difference between the two profiles.
We compare our findings with the results produced by the exploration and preferential return (EPR) individual mobility model4, a state-of-the-art model that accurately captures the visitation frequency of locations, the distribution of the radius of gyration across the population and its growth with time (ultraslow diffusion). The model incorporates two competing mechanisms, the exploration of new locations and the return to previously visited locations. We use the EPR model to simulate the mobility of 67,000 synthetic individuals (see Box 1, and Supplementary Notes 11 and 12) and computed for each synthetic individual the total rg and . As shown in Fig. 6a although for k=2 there is a weak tendency for points to gather around the diagonal, the empirically observed split into returners and explorers is absent from the model trajectories (see Supplementary Fig. 13). The difference between the empirical and synthetic data is especially clear when we explore P(sk) (Fig. 6b versus Fig. 4). For small k, in the model k-explorers (with the ratio sk≈0) dominate the population. For k≈60, we have the perfect balance between k-returners and k-explorers as for the GSM data set for k=4 (Fig. 4b, Supplementary Fig. 14). Thus, the EPR model overestimates by more than an order of magnitude the number of locations needed to accurately estimate the total radius of gyration. Contrarily to the empirical results, in the EPR model there is no significant correlation between total rg and the sum of the distances of the k-th most visited locations (Pearson correlation coefficient is close to zero), neither for k-returners nor for k-explorers (see Supplementary Fig. 15a,b).
The observed discrepancies between the empirical data and the EPR model could arise from the fact that in the model individuals can travel arbitrarily large distances, increasing their total rg with each jump. To correct for this limitation, we propose the d-EPR model, in which an individual selects a new location to visit depending on both its distance from the current position, as well as its relevance measured as the overall number of calls placed by all individuals from that location. We use the gravity model32,33 to assign the probability of a trip between any two locations, which automatically constrains individuals within the country’s boundaries (see Supplementary Notes 11, 13 and 14, Supplementary Fig. 16). This modification is justified by the accuracy of the gravity model to estimate origin-destination matrices at the country level34,35,36,37. The obtained d-EPR model generates trajectories that are in much better agreement with the empirical data: the balance between k-returners and k-explorers in the population is reached at k≈9, in contrast with k≈60 in the original EPR model (Fig. 6f), closer to k=4 in GSM and k=2 in GPS (Fig. 3). Consequently, the correlation plot of versus total rg displays the empirically observed split into returners and explorers (Fig. 3 even at k=2, Fig. 6d). The correlation between total rg and the distance between the most visited locations is much higher than in the original EPR model and closer to the values of GSM and GPS data (see Supplementary Fig. 15e,f).
Hence, the d-EPR model of human mobility reproduces the key features of the aggregated mobility patterns in a confined geographical space, accounting for the two classes of individuals, returners and explorers. The mechanism underlying the model can be easily understood: when a traveller returns, she is attracted to previously visited places with a force that depends on the relevance of such places at an individual level. In contrast when a traveller explores, she is attracted to new places with a force that depends on the relevance of such places at a collective level.
The relevance of returners and explorers dichotomy
Our findings are particularly relevant in two contexts: the geographical spreading of epidemics and social interactions. The geographical spreading of an epidemic is a direct consequence of individuals’ movements9,12,38,39. From the ‘patient zero’ (that is, the first infected individual), the virus is passed on to individuals who come into contact with them, contributing to the rapid growth of the epidemic. Obviously, the wider the range of mobility, the faster will the virus diffuse over the population. The question is, how does the presence of the two mobility profiles uncovered above affect the spreading pattern? To test this, we split the mobility history of an individual into time periods, and captured the trajectory’s reach up to time t using three measures: (i) the number of locations visited; (ii) the area covered; and (iii) the total radius of gyration rg(t). We observe that the trajectory of explorers is distributed over a larger territory, as they visit more locations, cover a larger geographic area and have a higher rg(t) with respect to returners. This pattern applies both for GSM and GPS data (see Supplementary Note 15, Supplementary Fig. 17). We also assess the different role the returners and explorers play in diffusion and spreading processes by considering the global mobility networks generated by individual mobility. The global mobility network is a graph whose nodes are locations and edges indicate the existence of at least one trip between two locations. To be specific, we focus on Tuscany, estimating the mobility of each individual through the GPS data and the number of residents in the locations through the official census cells provided by the ISTAT. We build 10 global mobility networks considering the trips of 10,000 randomly selected individuals, choosing different proportions of two-returners and two-explorers (0%, 10%,…, 100% of two-explorers in the random population). For each network, we compute the global invasion threshold R* under the assumption of a diffusion dynamics with large subpopulations and a low reproductive number (that is, close to the subpopulation epidemic threshold)40 (Supplementary Note 16). In a metapopulation network, an epidemic can spread and invade the system only if R*>1, and this global invasion threshold is affected by the topological fluctuations of the network’s degree: the larger the degree heterogeneity, the higher the R* and therefore the higher is the chance that the epidemic will globally invade the metapopulation. We compute each of the 10 networks 1,000 times, randomly choosing 10,000 individuals with different proportion of two-returners and two-explorers, and obtaining 1,000 values for the invasion threshold for each network (Supplementary Note 16, Supplementary Fig. 18). We observe that the mean diffusion invasion threshold increases with the fraction of explorers in the random population. Although more refined metapopulation infection models are needed to provide accurate estimates of invasion probabilities, our analysis reveals a clear distinction between the diffusion properties of the returners and explorers’ mobility networks.
Recent advances in characterizing the signature41 or strategies42 of social interactions and the possibility to exploit the information on an individual’s social ties to predict her future locations43,44 demonstrate a strong connection between social interactions and human mobility patterns. Here we bring a further contribution by showing that individuals of the two profiles, returners and explorers, tend to engage in social interactions preferably with individuals of the same profile. In other words, individuals who communicate with each other are more likely to belong to the same mobility group than by chance. In particular, we find that the fraction of two-returners whose ‘best friend’ (that is, the most called contact) is also a two-returner is RR≈0.27. We compare this figure with the highest fraction of two-returners best friends obtained from 100,000 randomized experiments where we randomly reassign each individual’s best friend, obtaining RRrand≈0.21, resulting in a highly significant P value (<10−5), as shown in Fig. 7a,b. The same applies to two-explorers (EE≈0.81, EErand≈0.78), as shown in Fig. 7c,d. As we consider the n-th most called contact and compare the fraction of individuals with the n-th best friend in the same mobility group, we find that the observed fractions are significantly higher than those obtained by chance for all n up to 15, as shown in Fig. 8. Our findings reveal the existence of a strong correlation between the mobility behaviour of individuals and their social relationships, although further experiments are needed to understand whether this can be interpreted as a homophily or influence effect.
Here we report the existence of two distinct profiles characterizing human mobility: returners and explorers. Returners limit much of their mobility to a few locations, hence their recurrent and overall mobility are comparable. In contrast, the mobility of explorers cannot be reduced to few locations. These patterns cannot be explained by the EPR model of human mobility, unable to distinguish returners from explorers. We show that by incorporating a gravity model into the EPR mechanism, we can recover the two classes, the obtained extended model coming closer to the empirical observations characterizing the two profiles. The returner/explorer dichotomy has a strong impact on spreading and social interactions. We show that explorers and returners play different roles in the disease spreading and that they tend to engage in social interactions with individuals with similar mobility profiles. The emerging profiles of returners and explorers offer another step towards deriving accurate models of human mobility, capable of generating realistic simulations, predictions and what-if reasoning in context such as energy consumption, gas emission and urban planning45.
How to cite this article: Pappalardo, L. et al. Returners and explorers dichotomy in human mobility. Nat. Commun. 6:8166 doi: 10.1038/ncomms9166 (2015).
We wish to thank Octo Telematics Srl (http://www.octotelematics.com/) for providing the GPS data. The research reported in this article has been partially supported by the European Union under the FP7-ICT Program: Project DataSim n. FP7-ICT-270833 (http://www.datasim-fp7.eu/). This work was also partially funded by the European Community's H2020 Program under the funding scheme ‘FETPROACT-1-2014: Global Systems Science (GSS)’, grant agreement #641191 ‘CIMPLEX: Bringing CItizens, Models and Data together in Participatory, Interactive SociaL EXploratories’ (https://www.cimplex-project.eu/). A.-L.B. was also supported by DTRA grant HDTRA1-10-0100 and by FET OPEN Multiplex #317532.
Supplementary Figures 1-18, Supplementary Tables 1-2, Supplementary Notes 1-16 and Supplementary References