Correlation between social proximity and mobility similarity

Human behaviors exhibit ubiquitous correlations in many aspects, such as individual and collective levels, temporal and spatial dimensions, content, social and geographical layers. With rich Internet data of online behaviors becoming available, it attracts academic interests to explore human mobility similarity from the perspective of social network proximity. Existent analysis shows a strong correlation between online social proximity and offline mobility similarity, namely, mobile records between friends are significantly more similar than between strangers, and those between friends with common neighbors are even more similar. We argue the importance of the number and diversity of common friends, with a counter intuitive finding that the number of common friends has no positive impact on mobility similarity while the diversity plays a key role, disagreeing with previous studies. Our analysis provides a novel view for better understanding the coupling between human online and offline behaviors, and will help model and predict human behaviors based on social proximity.

to measure the proximity and similarity between two individuals, whether the structure of social network impacts on one's mobility trajectory, and how to find pairs of individuals with similar behavior patterns.
In this report we perform a finer analysis to demonstrate what and how social proximity measurements are correlated with mobility similarity between two individuals based on an LBSN dataset (see Methods), in which people share real-time locations (usually referred to as "check-in") with online friends. Compared with other kinds of data sources, LBSN data have the properties of large-scale mobile records, annotated locations with descriptive tags, user-driven sparse data and explicit social friendship 18 . The LBSN dataset offers a bridge connecting social network and mobility trajectories. Specifically, there are many types of online social connections can be used to measure social proximity according to different criteria, e.g., whether two individuals are friends, whether they have common friends, and how the common friends are connected. It is well studied in complex networks that "befriending" and "having common friends" are strongly related 33 . Besides, people visit different locations in their daily life for working, living, entertainment, etc. Two randomly selected individuals may behave similar or different visiting patterns in the physical space. It has been observed that a variety of demographic attributes, such as gender, age, education background and job are highly correlated with mobility tracks [34][35][36] . A pair of friends with same personal attribute has a higher probability to behave more similarly. In our work, we find that human mobility similarity is strongly correlated with the existence of social connection and common neighbors (common friends on social network). Once the existence of online connection and common neighbors is given, the number of common neighbors has no positive impact on mobility similarity, while the higher diversity in common neighbors brings higher similarity in mobility pattern.

Results
The mobility similarity between a pair of individuals is measured with Spatial Cosine Similarity (SCos), which is the cosine similarity of two individuals' trajectory vectors (see Methods). Obviously, a higher SCos indicates more similar behaviors. Besides, four metrics are used to measure social network proximity: (i) whether two individuals are friends; (ii) whether they have common neighbors; (iii) how many common neighbors they have; and (iv) the number of connected components in the induced subgraph by these common neighbors.
We start by investigating the effect on mobility similarity of two social network proximity metrics, namely "befriending" (whether two individuals are friends) and "having common friends" (whether they have common neighbors on social networks). Figure 1(a) and (b) report the probability distributions of mobility similarity between pairs of friends and non-friends (see Methods), and pairs of friends with and without common neighbors respectively. Figure 1(c) further reports the expected similarity of four configurations: whether or not two individuals are friends and whether or not they have common neighbors. From Fig. 1(a) we know that the pairs of friends are observed with constantly higher mobility similarity than non-friends, i.e. 0.03084 (friends) versus 0.00049 (non-friends) on average of mobility similarity. Similar phenomenon is observed in Fig. 1(b), i.e. average mobility similarity between friends with common neighbors (0.04149) is 5 times higher than that without common neighbors (0.00810). Figure 1(c) illustrates that the mobility similarity between non-friends is almost indifferently low no matter they have common neighbors or not. Befriending is strongly correlated with high mobility similarity between two individuals, with average SCos increases by two orders of magnitude. Having common friends further doubles the similarity. Therefore, both "befriending" and "having common friends" imply high mobility similarity between individuals. Friends indeed are much closer in behavior pattern than strangers, and the existence of common neighbors could be another strong predictor of the similarity of individual mobility patterns. Because friends have a higher possibility of living or working together, or having the same hobbies, promoting the similarity of their mobility patterns compared with strangers. Meanwhile the common neighbors will strengthen the intimacy between friends. Those two factors are affecting mobility similarity in different aspects and are not mutually replaceable. Furthermore, a null model [37][38][39][40] constructed by rewiring edges in social network shows that the average SCos decreases by two orders of magnitude in null model compared with real data, validating the significance of the results (see Supporting Information).
In the above two metrics, common friends always provide richer information between two individuals compared with befriending, a rather intuitive criteria. For example, the number of common neighbors (CN, or the size of common neighborhood) are always regarded as an implication of intimate relationship in the research of link prediction 33 as local similarity indices and recommendation algorithm 41 as structure similarity indices. Here, we count CN of two individuals as the third measurement of social proximity (see Methods). One might expect that friends with more common neighbors have more similar mobility pattern, as suggested in previous studies 28 . To our surprise, however, measurements from the following three aspects all reveal no positive impact of CN on mobility similarity. To be specific, (1) A low Spearman coefficient, 0.046, is observed between SCos and CN, suggesting these two variables are less likely correlated numerically. (2) We compare the SCos probability distributions of 5 samples of friends (see Methods) with 1, 2, 3, 4, or 4+ common neighbor(s), namely CN = 1, CN = 2, CN = 3, CN = 4 or CN ≥ 4. As shown in Fig. 1(d), friend pairs with different CN (given CN > 0) have almost identical distributions over mobility similarity, indicating that more common neighbors will not bring higher similarity. pairwise − Kolmogorov Smirnov statistics hypothesis tests. To exclude random error, each test is repeated for 1,000 parallel runs with Bonferroni correction (see Methods and Supporting Information), and none test rejects the null hypothesis. Therefore, we conclude that the mobility similarity is independent from the number of common neighbors since the samples are identically distributed. In a word, all the results consistently indicate the mutual independence between the mobility similarity and common neighbors. In another word, multiple common neighbors show equivalent effect as a single common neighbor when measuring mobility similarity. You have equal possibilities to find pairs with similar mobility patterns among those who are friends and having common friends, no matter how many common friends they have.
Shall we claim that common neighbor is a binary switch in shaping friends' mobility similarity while its details make no difference? No. It is the topological structure among common neighbors, instead of the size, that indicates greater mobility similarity. Define the common neighbor network of a pair of nodes as the induced subgraph by their common neighbors. Figure 2(a) and (b) illustrate that there are various local organizations of common neighbor network when CN is given. For example, with a certain CN, the common neighbors of a pair of individuals may cluster into a tightly connected group, or left isolated. Such phenomenon inspires us to investigate mobility similarity from the perspective of the micro-structure of common neighbors. Given the number of common neighbors, we measure the diversity of common neighborhood by the number of connected components (CC). A higher CC signifies more groups and higher degree of diversity of common neighbors. We collect individual pairs by configurations of CN and CC, such as = = CN CC { 3 , 2 } and report the effect of CN and CC on the average SCos in Fig. 2(c) and (d) respectively. To be specific, Fig. 2(c) reports the average mobility similarity against common neighborhood size when we control its diversity, which, surprisingly, shows a consistently decreasing trend that more common neighbors lead to weaker mobility similarity. For example, if two individuals have 2 groups of 9 common neighbors, their mobility similarity could be as low as half of that when they have 2 groups of 2 common neighbors (i.e., two distinct common friends). On the other hand, as shown in Fig. 2(d), increasing diversity dramatically increases mobility similarity, when the number of common neighbors is controlled. Two individuals having 4 distinct friends = = CC CN ( 4 , 4 ) are twice similar in mobility than those having 4 connected friends = = CC CN ( 1 , 4 ). This phenomenon agrees with Ugander et al. 29 that structural diversity of social network takes the role of common neighborhood size in shaping individual behaviors. It reveals We add 0.01 to each data point to better illustrate zero in a log-log plot. In (a), it is consistently more probable to observe a pair of friends (red circles) with non-zero SCos than a pair of non-friends (blue squares), while the former is much less probable to be observed with zero SCos. Similarly in (b), the pairs with common neighbors (red circles) have higher mobility similarity than that without common neighbors (blue squares). However, almost invisible differences can be seen between the five groups of pairs with CN = 1, 2, 3, 4 and ≥4 common neighbors in (d). In (c), the labels above the bars illustrate the average SCos over all pairs of friends for 4 groups, by intersecting the two factors we observe. The differences between these 4 groups indicates that these two factors are not mutually inclusive. Notice that, we use logarithmic scale in (c) and thus the significant difference between red and blue bars are seemingly small. that diversity of common neighbors is a signal of strong mobility similarity, while counter intuitively, the number of them give no positive effect.

Discussions
Various kinds of human behaviors are highly correlated, from temporal to spatial, from online to offline. We analyzed the relation between online social proximity and offline mobility similarity in this work. Our empirical analysis reveals that mobility similarity between two individuals largely depends on their online social network connection, and further enhanced by the existence of common friends. Given the existence of common friends, the number of them shows no positive impact on mobility similarity. These results disagree with previous studies that believe the number of common friends is a positive predictor. It is worth noting that the number of connected components proves a consistent positive predictor of mobility similarity, though further experiments would be necessary to provide strong evidence.
The results can be explained from two aspects. On the one hand, it is not trivial to explain the phenomenon, namely individual mobility similarity is strongly related with the existence of common neighbors but hardly influenced by the number. Intuitively speaking, the phenomenon suggests that one common friend is enough to get a pair of individuals closer, while more common friends have no significantly additional effect. On the other hand, when the neighborhood of two individuals splits into pieces, there is a high probability that these two individuals belong to several different communities, strengthening their closeness simultaneously and leading to a higher similarity. From the perspective of human behavior analysis and link prediction, the existence of common neighbors is usually connected with direct friendship, but our experiments reveal that the existence of common neighbors and the direct friendship link affect offline mobility similarity respectively.
Different from previous studies that believe positive impact of the number of common neighbors, our empirically results show its negative impact while controlling the diversity in common neighbors. This could be explained in two folds. Firstly, the size and the number of connected components of common neighbors are trivially correlated, i.e., we cannot have 5 groups with less than 5 common friends. The auto correlation might lead to apparent positive relation between a larger number of common neighbors and a stronger mobility similarity, which indeed comes from the effect of diversity behind. Secondly, difference in data might also vary the conclusion. We collect data from individual check-in records, while previous studies leveraged mobile call GPS  12,14 . The difference could arise from three facts. (1) Mobile call GPS records are coarser, reporting locations of base stations (separated by kilometers). Check-in records are finer, reporting coordinates of mobile device GPS (accuracy within 100 meters). (2) Most human behaviors are trivial and less informative 16 , resulting in noisy tracks reflected by purposelessly reported mobile call GPS data. In contrast, check-in records were submitted on purpose and thus believed to ensure a better signal-noise ratio. (3) Different kinds of social network reflects diverse social relationship. For example, Twitter is a directed network of follow-following relationship, while QQ in our research is an un-directed network with reciprocal relationships. There may be discrepancies in behaviors pattern aroused by the types of relationship, such as transferring information or causing behaviors.
We use SCos to measure mobility similarity owing to its simplicity, universality and high efficiency although only spatial information is considered in it. However, in recent researches, many alternative metrics have been raised to describe mobility similarity more accurately and reasonably with temporal, personal or global factors 28,42,43 . In order to verify whether the current results are robust to other similarity metrics, we choose Co-location rate (CoL) 28 which incorporates temporal information to measure mobility similarity. CoL calculates the probability of two individuals going to the same destination at the same time, i.e., the ratio of co-occurrence. Similar phenomenon can be observed from the experiment results based on CoL that befriending, having common friends and diversity of common friends all have positive influence on mobility similarity, indicating the robustness of our conclusion (see Supporting Information).
Besides, there are many other metrics to measure social proximity in a network. For example, the concept edge-connectivity between two nodes s and t in a graph which is counted as the minimum number of links must be removed to destroy all paths from s to t, is always used to measure the reachability between two nodes 44,45 . Intuitively, the more independent pathways between two nodes, the closer their relationship is. We calculate the edge-connectivity of each pair, and then report the results in Supporting Information. A similar pattern can be observed that edge-connectivity has no positive impact on mobility similarity, just like CN. Consequently, the results again support our conclusion that neither the number of common friends nor the number of pathways between two individuals plays a positive role in increasing the mobility similarity between them.
The preference or personality also affects individual mobility pattern and similarity. Song et al. 15 uses a statistical model based on explore and return mechanism to describe individual human mobility. Later in ref. 13 , the authors develop a dichotomy to classify individuals to two classes: explorers and returners, who have distinct mobility tendency. We apply their method to our dataset and observe a consistent phenomenon that the mobility similarity between the same class of individuals (group explorer-explorer and returner-returner) is obviously higher than that of different class (group explorer-returner). More details can be seen in Supporting Information.
Our analysis provides a statistical view of the coupling between human online social proximity and offline mobility similarity, and inspires deep understanding to the intrinsic of topological structure when predicting offline behaviors. Generally, the social network and check-in records correspond to real physical layer and virtual social layer in nature world and human society. Therefore, the LBSN data we used provides a good medium to couple physical space and social space. Technically, our results could offer new insight and evidence in the fields of location prediction and friend recommendation 17 . For example, mining human mobility pattern and leveraging social network information for next location prediction of a certain individual is always a big challenge. With deeper understanding of correlation between social proximity and mobility similarity, it will be easier to find someone (A)'s friend (B) who has more similar mobility pattern, which is helpful to predict A's next location according to B's trajectories. On the contrary, we could also recommend friend who has a higher mobility similarity with the target user to him/her.
Our study opens a door to a series of open questions. It is challenging and valuable to explain the fact that mobility similarity depends on the existence, but not the size, of online common neighbors. It remains unknown whether the effect of common neighbors could be generalized to more scenarios, before adequate empirical analysis is done on different types of social networks. It is also valuable to explore effect of common neighbors built with different types of edges, e.g., classmates, relatives, professional, etc.

Methods
Data description. Our data is authorized by a Chinese online service provider Tencent, whose instant message product (QQ) and mobile check-in service provides the social network information and temporal-spatial mobility records respectively. QQ Users make friends and chat with them online as well as travel around offline in their daily life. Accordingly, on one hand, the social proximity between individuals can be depicted by the network structure. On the other hand, the trajectory sequence of each user and the similarity of mobility pattern between two individuals can be obtained as well. Therefore, this comprehensive dataset includes well coupled human online and offline behaviors. We were the first to analyze this dataset.
Specifically, The users are sampled from a coastal city of China, while their check-in records cover the whole region of Chinese mainland. The dataset contains three parts of information, namely, individual demographic information, social relationship and time-stamped check-in records (longitude and latitude with an error no greater than 0.1 km). We remove inactive users with less than 100 check-ins for stable statistics, resulting in a dataset of 97,657 users with 617,765 friend links and 28,827,898 check-in records during the second half of year 2013. The average degree, average clustering coefficient and assortative coefficient of the social network is 6.32, 0.09 and 0.12 respectively. As shown in Fig. S1, the degree distribution of social network follows power-law function with exponential cutoff. , where PV x l ( , ) stands for the probability of individual x to visit location l, || || A is the modulus of vector A. Only locations and its visit frequency is considered in this measurements, without considering the visiting sequence.
These two metrics measure the strength and similarity of social relationship from online and offline aspects. We calculate CN and SCos for all pairs of friends in our dataset, and obtain the distributions of these two metrics (see Supporting Information). It can been observed that both distributions derive from normal or approximate normal distribution. The results provide not only a description of heterogeneity of human behavior and society but also a evidence of choosing a appropriate correlation coefficient.
Sampling method and unbiased test. Due to the sparsity of social network (the network density in this study is only 0.00013), there are much more pairs of non-friends than friends. To reduce statistical error and computational complexity, we randomly select equal-sized pairs of non-friends as friends for the comparative study in the discrepancy of mobility similarity between them ( Fig. 1(a) and (c)).
Considering that the degree distribution of our social network behaves heavy-tail shape, i.e., a few individuals have much more friends than majority, we sample individual pairs without overlapping to avoid auto-coupling. Specifically, we adopt the sampling without replacement method to pick out friends to ensure that every individual appear in the sample only once to avoid the influence from the hub nodes with plenty of links. In the sampling process, once a edge is chosen, the two nodes connected by it will be removed from the sample pool. For example, individual k has friend i and j, if edge k i ( , ) is chosen, both node k and i will be removed from nodes set and edge k j ( , ) can't be used anymore. The obtained samples are used for investigating the mobility similarity of friends with different number of common neighbors.
The sampling is processed as follows. There are nearly 10 thousands nodes and over 600 thousands edges in the social network initially. Every step we pick out one edge i j ( , ) randomly and the nodes i and j are removed from network. Finally some isolated nodes without any friend may be left and they are neglected. Thereby an ego-social network 25,46 is obtained where every edge represent a pair of individuals whose similarity is measured by CN and SCos. The sampling process is carried out for 1000 times to ensure the randomness. In every sample, we have about 70 thousands nodes and nearly 40 thousands edges. Take 1 out of 1000 experiments as an example, 38,553 edges are obtained, within which 24,996 pairs of friends have no common neighbors and the remaining have at least one. In the same sample, the amounts of pairs whose CN = 1, 2, 3, 4 and ≥4 are 6459, 2903, 1555, 917 and 2640 respectively.
After sampling, unbiased test is used to ensure that all the samples keep consistent with each other. Specifically, we calculate the mean and std of SCos and plot their distributions, which show a narrowed unimodal shape, illustrating that the 1000 samples are unbiased.
Kolmogorov-Smirnov test. Kolmogorov-Smirnov test (KS test) is a kind of nonparameter test which can be used to verify whether two empirical samples are drawn from the same distribution. The KS statistic quantifies a distance between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution. The procedure of KS test can be seen in ref. 47 and the Methods in ref. 34 . Besides, the null hypothesis is set as that the distribution samples are independent identically distributed (i.i.d.) with significance level α = .
0 01, thus the alternative hypothesis is that the samples are not i.i.d. In our research, owing to that the test is performed for 1000 times, the significance level is revised as adjust α = .
= . 0 01/1000 0 00001 according to Bonferroni correction 48 . Therefore, as long as one of all the − p values is smaller than adjusted α, the null hypothesis is rejected and it can be deduced that two samples are different from each other. Conversely if all the − p values are greater than adjusted α, we can't reject the null hypothesis and the two samples are supposed to be extracted from the same population. In Supporting Information, we give the results of KS test of the distributions with different CN. If the tested two distributions are not i.i.d., the − p values will follow a normal distribution. However, Fig. S5 demonstrates that all the distributions are not normally distributed, indicating that the two distributions are drawn from the same population.