A universal opportunity model for human mobility

Predicting human mobility between locations has practical applications in transportation science, spatial economics, sociology and many other fields. For more than 100 years, many human mobility prediction models have been proposed, among which the gravity model analogous to Newton’s law of gravitation is widely used. Another classical model is the intervening opportunity (IO) model, which indicates that an individual selecting a destination is related to both the destination’s opportunities and the intervening opportunities between the origin and the destination. The IO model established from the perspective of individual selection behavior has recently triggered the establishment of many new IO class models. Although these IO class models can achieve accurate prediction at specific spatiotemporal scales, an IO class model that can describe an individual’s destination selection behavior at different spatiotemporal scales is still lacking. Here, we develop a universal opportunity model that considers two human behavioral tendencies: one is the exploratory tendency, and the other is the cautious tendency. Our model establishes a new framework in IO class models and covers the classical radiation model and opportunity priority selection model. Furthermore, we use various mobility data to demonstrate our model’s predictive ability. The results show that our model can better predict human mobility than previous IO class models. Moreover, this model can help us better understand the underlying mechanism of the individual’s destination selection behavior in different types of human mobility.

such as the GPS trajectories from vehicles and call detail records from mobile phones. The PWO model assumes that the probability of an individual selecting a destination is proportional to the number of opportunities at the destination and inversely proportional to the total population at the locations whose distances to the destination are shorter than or equal to the distance from the individual's origin to the destination, which can better predict intracity trips. Yan et al. further combine the PWO model with the continuous-time random walks model 39 to obtain a universal model of individual and population 40 , which realizes the prediction of intracity and intercity mobility patterns at both the individual and population levels. Huang et al. propose a novel human mobility model that can capture real-time human mobility in a sustainable and economical manner, which broadens our view. 4 Sim et al. establish a deliberate social tie (DST) model 41 from the perspective of social interactions. The DST model assumes that an individual seeks out social ties only with other individuals whose attribute values are higher than the attribute value of the individual and the attribute values of the intervening opportunities. Motivated by the DST model, Liu and Yan propose an opportunity priority selection (OPS) model that assumes that the destination selected by the individual is the location that presents a higher benefit than the benefit of the origin and the benefits of the intervening opportunities 42 . In general, all of the IO class models [32][33][34][35][36][37][38][40][41][42] share two common assumptions: (i) using an agent to represent all of the individuals; (ii) when selecting a destination, the agent will compare the benefits of different locations. The difference between these IO class models is that the rules for comparing benefits of different locations are different. Although the radiation class models [32][33][34][35][36][37] can accurately predict commuting behavior and other IO class models 38,[40][41][42] can accurately predict intracity and/or intercity mobility, an IO class model that can simultaneously describe the individual's destination selection behavior an different spatiotemporal scales is still lacking.
In this paper, we propose a universal opportunity (UO) model to characterize an individual's destination selection behavior. The basic idea of the model is that when an individual selects a destination, she/he will comprehensively compare the benefits of the origin, the destination and the intervening opportunities. Furthermore, we use various mobility data sets to demonstrate the predictive power of our model. The results show that the model can accurately predict different spatiotemporal scale movements such as intracity trips, intercity travels, intercity freight, commuting, job hunting and migration. Moreover, our model can also cover the classical radiation model and OPS model, presenting a new universal framework for predicting human mobility in different scenarios.

Results
Model. We assume that when an individual chooses a destination, like the radiation model 32 and the OPS model 42 , she/he first evaluates the benefit of the location's opportunities 43 where the benefit is randomly chosen from a distribution p(z). After that, the individual comprehensively compares the benefits of the origin, the destination and the intervening opportunities and selects a location as the destination. To characterize the behavior of an individual comprehensive comparison of the benefits of the locations, we use two parameters α and β. Parameter α reflects the behavior of the individual's tendency to choose the destination whose benefit is higher than the benefits of the origin and the intervening opportunities. Parameter β reflects the behavior of the individual's tendency to choose the destination whose benefit is higher than the benefit of the origin, and the benefit of the origin is higher than the benefits of the intervening opportunities. According to the above assumption, the probability that location j is selected by the individual at location i is where m i is the number of opportunities at location i, m j is the number of opportunities at location j, s ij is the number of intervening opportunities 30 (i.e., the sum of the number of opportunities at all locations whose distances from i are shorter than the distance from i to j), is the probability that the maximum benefit obtained after m i + α ⋅ s ij samplings is exactly z, z Pr ( ) s ij < β⋅ is the probability that the maximum benefit obtained after β ⋅ s ij samplings is less than z, Pr z ( ) m j > is the probability that the maximum benefit obtained after m j samplings is greater than z, α and β are both non-negative and α + β ≤ 1.
Since Pr x ( < z) = p(<z) x , we obtain Equation (1) This is the final form of the model and we name it the universal opportunity (UO) model. The α and β parameters in the UO model reflect the two behavioral tendencies of the individual when choosing potential destinations (where the opportunity benefit is higher than the benefit of the origin). From Eq. (3), we can see that the larger the value of parameter α, the greater the probability that distant potential destinations will be selected by the individual. We name this behavioral tendency the exploratory tendency. On the other hand, the larger the value of parameter β, the greater the probability that near potential destinations will be selected by the individual. We name this behavioral tendency the cautious tendency. We choose average travel distance and normalized entropy as two fundamental metrics to discuss the influence of two parameters α and β on individual destination selection behavior. The average travel distance reflects the bulk density of individual destination selection [44][45][46][47] , and normalized entropy reflects the heterogeneity of individual destination selection 48 . As shown in Fig. 1, the two fundamental metrics have the same regularities with a change in two parameters, whether the number of destination opportunities is a uniform or random distribution. When α = 0, β = 1, the average travel distance is the shortest, and the normalized entropy value is the smallest; when α = 0, β = 0, the average travel distance is the longest, and the normalized entropy value is the largest. From the definitions of the two parameters, we can easily explain the reasons for the regularities. When α is closer to 0, β is closer to 1, the individual is more cautious, and the probability of choosing near potential destinations is higher, so the shorter the average travel distance and the stronger the heterogeneity. When α is closer to 1, β is closer to 0, the individual is more exploratory, and the probability of choosing distant potential destinations is higher, so the average distance is increased while the heterogeneity is decreased. When α and β are both closer to 0, the individual attaches more importance to the benefit that the location brings to him/her and does not care about the order of locations, so the longer the average travel distance and the stronger the homogeneity.
Moreover, when α and β take extreme values (i.e., the three vertices of the triangle in Fig. 1), we can derive three special human mobility models. When α = 0, β = 0, we name this model the opportunity only (OO) model (see details in Supplementary Information, The derivation of the OO model). In this model, the individual chooses the location whose benefit is higher than the benefit of the origin. Then, the probability of the individual at location i choosing location j as the destination is When α = 1, β = 0, our model can be simplified to the OPS model, in which the individual chooses the location whose benefit is higher than the benefit of the origin and the benefits of the intervening opportunities (see details in Supplementary Information, The derivation of the OPS model). Then, the probability of the individual at location i choosing location j as the destination is When α = 0, β = 1, our model can be simplified to the radiation model, in which the individual chooses the location whose benefit is higher than the benefit of the origin and the benefits of the intervening opportunities are lower than the benefit of the origin (see details in Supplementary Information, The derivation of the radiation model). Then, the probability of the individual at location i choosing location j as the destination is From Eqs. www.nature.com/scientificreports www.nature.com/scientificreports/ trips in London (LOT) and intracity trips in Berlin (BET) (see Methods), to validate the predictive ability of the UO model. We first extract the flux T ij from location i to location j from the data set and obtain the real mobility matrix. Then, we exploit the Sørensen similarity index 38 (SSI, see Methods) to calculate the similarity between the real mobility matrix and the mobility matrix predicted by the UO model under different parameter combinations. The results are shown in Fig. 2. Figure 2o shows the optimal values of the parameter α and β corresponding to the highest SSI for the fourteen data sets.
It can be seen from Fig. 2a-d that for USC, ITC, HUC and CNF, when α is close to 0 and β is close to 1, the SSI is relatively large. The reason is that for commuting data sets (USC, ITC and HUC), the commuting distance or time is very important for commuters. As a result, most people tend to choose near potential destinations when finding a job based on their place of residence or adjusting their place of residence after finding a job. This cautious destination selection tendency also exists in freight. Freight to far destinations will lead to an increase in transportation costs and a decrease in the freight frequency, which will have a negative impact on freight revenue. Thus, unless the destination opportunity benefit is very high, the individual tends to choose a near destination rather than a far destination for freight. For the migration and job hunting data sets (USM and CNJ), when α is close to 1 and β is close to 0, the SSI is relatively large, as shown in Fig. 2e,f. The reason is that both job seekers and migrants pay more attention to the destination opportunity benefit rather than the distance to the destination. In other words, they are more exploratory but less cautious. Even if a high benefit destination is far away, it will still be selected by individuals with a relatively high probability. The reason is that the distance to the destination has a smaller impact on long temporal scale mobility behaviors, such as migration and job hunting, than on daily commuting behaviors. For intercity travel data sets (CNT, UST and BLT), when α and β are both near the middle of www.nature.com/scientificreports www.nature.com/scientificreports/ the diagonal line of the triangle, the SSI is relatively large, as shown in Fig. 2g-i. For most people, intercity travel is occasional and not as frequent as commuting. Travelers are less inclined than commuters to choose near potential destinations but they tend to explore distant potential destinations. Thus, the exploratory tendency parameter α of intercity travels is much larger than that of commuting. On the other hand, the importance of the travel cost of intercity travels is higher than that of the cost of migration. Thus, the cautious tendency parameter β of intercity travels is larger than that of migration. For intracity trips data sets (SZT, BJT, SHT, LOT and BET), when α and β www.nature.com/scientificreports www.nature.com/scientificreports/ are both close to 0, the SSI is relatively large, as shown in Fig. 2j-n. The reason is that compared with the intercity mobility behavior on a large spatial scale, the spatial scale of intracity mobility behavior is small. In this scenario, the individual is not necessarily concerned about the travel distance and focuses more on the benefit that the location will directly bring to him/her. Thus, the optimal values of α and β are both close to 0, as shown in Fig. 2o.
We next compare the predictive accuracy of the mobility fluxes of the UO model with the radiation model, the OPS model and the OO model. In terms of SSI, as shown in Fig. 3 and Table 1, the UO model performs best. However, the radiation model and the OPS model can provide only relatively accurate predictions for some data sets. For example, the radiation model can predict commuting and freight trips relatively accurately but cannot accurately predict other types of mobility. The reason is that the individual tends to choose near potential destinations rather than distant potential destinations in commuting and freight, where travel costs are more important. From Fig. 2o, we can see that for commuting and freight data sets, the optimal parameter β (which reflects the individual's cautious tendency) of the UO model is close to 1, and the optimal parameter α (which reflects the individual's exploratory tendency) is close to 0. Therefore, the prediction accuracy of the radiation model in which the individual only chooses the closest potential destination (i.e., α = 0, β = 1) is close to that of the UO model in commuting and freight data sets. However, the prediction accuracy of the radiation model is considerably lower than that of the UO model in job hunting, migration and noncommuting travel data sets. The reason is that the individual is more likely to choose distant potential destinations in these data sets. In these cases, the prediction accuracy of the OPS model, in which the individual tends to choose distant potential destinations, is closer to that of the UO model. We further measure the fluxes predicted by different models compared with the real fluxes and find that the average fluxes predicted by our model are more in agreement with real observations than the other three models (see details in Supplementary Information, Comparison among different models). We also use a frequently used statistical index, named the root mean square error (RMSE), to measure the prediction errors of   Table 1. Comparison of models prediction accuracy. SSI is the Sørensen similarity index between the real mobility matrix and the mobility matrix predicted by different models. RMSE is the root mean square error of predicted mobility matrix. UO, RM, OPS, and OO stand for the universal opportunity model, the radiation model, the opportunity priority selection model and the opportunity only model, respectively.
www.nature.com/scientificreports www.nature.com/scientificreports/ the UO model and the other three models, and Table 1 lists the results. From the table, we can see that in most cases, the RMSE of the UO model is smaller than that of the other benchmark models, although the RMSE is not the parameter optimization objective of the UO model. These results prove that the three models only capture the individual's destination selection behavior at a specific spatiotemporal scale. Yet our UO model can accurately describe the individual's destination selection behavior at different spatiotemporal scales.

Discussion
Although previous IO class models are widely used to predict the mobility of people between locations [32][33][34][35][36][37][38][40][41][42] , these models can only achieve accurate prediction at specific spatiotemporal scales. In this paper, we developed a UO model to predict human mobility at different spatiotemporal scales. Our model establishes a new framework in IO class models and covers the classical radiation model 32 and the OPS model 42 . Although the UO model has two parameters, they are different from the parameters in some regression analysis models or machine learning models in the sense that they simply improve the prediction accuracy of the model. These two parameters essentially describe the two tendencies, i.e., exploratory tendency and cautious tendency, of an individual's destination selection behavior. They not only enable the UO model to better predict human mobility at different spatiotemporal scales than the parameter-free models but also help us better understand the underlying mechanism of the individual's destination selection behavior in different types of human mobility.
Many phenomena in complex system field are strongly related to human mobility 31 . For example, the spread of disease is directly affected by human travel distance between locations and the population size of locations [15][16][17][49][50][51][52] . The UO model can accurately describe the individual's destination selection behavior at different spatiotemporal scales, which has potential applications for understanding the spread of disease within humans. Not only that, but the IO model can also describe an individual's selection behavior in social networks such as friend networks and scientific collaboration networks. In friend networks, the individual tends to choose friends who are close to him/her and have a high sense of identity 41,53 . In scientific collaboration networks, the individual tends to choose nearby scholars who have high scientific influence 54 . These phenomena indicate that when one seeks to build beneficial ties, she/he will take into account both the distance and the benefits of the opportunities. The UO model can describe the individual's interactive object selection behavior, providing a new perspective for social network analysis.
Despite its fine performance in predicting human mobility, the UO model has room for further improvements. For example, most existing IO class models use an agent to represent all of the individuals and neglect the diversity of individual selection behavior 46,[55][56][57][58][59] . Building mobility prediction model for each individual may reflect the diversity in detail. However, it is extremely cumbersome and cannot grasp the commonality among individuals' mobility patterns. One possible approach is first clustering individuals according to their mobility behavior characteristics [60][61][62] , then expanding our UO model for different classes of individuals, which may more accurately predict human mobility.

Material and methods. Data sets.
(1) Commuting trips. The commuting trips data sets include the commuting trips between United States' counties 32 (USC), the commuting trips between the provinces of Italy 35 (ITC) and the commuting trips between the subregions of Hungary 35 (HUC), which were downloaded from http://www.census.gov/population/www/cen2000/com-muting/index.html, http://www.stat.it/storage/cartografia/matrici_pendolarismo/matrici_pendolarismo_2011.zip and http://www.ksh.hu, respectively. Since we focus on mobility among zones(counties, provinces or subregions), all the residences/workplaces within a zone are regarded as the same with an identical zone label. Then, we can accumulate the total number T ij of trips from zone i to zone j, which is also carried out in the following data sets. (2) Freight between Chinese cities (CNF). The CNF data set is extracted from the travel records of freight between Chinese cities from 19 May 2015 to 23 May 2015. When freight is loaded or unloaded, the coordinates and time are recorded automatically by a GPS-based device installed in the truck. All the loading/ unloading locations within a city are regarded as the same with an identical zone label. (3) Internal job hunting in China (CNJ). The CNJ data set is extracted from more than 160 million job hunters' resumes from 2006 to 2016 and was downloaded from https://www.zhaopin.com. The resumes contain job hunter work experience, from which we can obtain a job hunter's former workplaces. All the workplaces within a city are regarded as the same with an identical zone label. (4) Internal migrations in the US (USM). The USM data set is extracted from the Statistics of Income Division of the Internal Revenue Service (IRS) in the US from 2011 to 2012 and was downloaded from https://www. irs.gov/statistics/soi-tax-stats-migration-data. The IRS contains records of all individual income tax forms filed in each year, from which we can determine who has or has not, moved residence/workplace locations in the intervening fiscal year 31 . All the residence/workplace locations within a state are regarded as the same with an identical zone label. (5) Intercity travels. The intercity travels data sets include intercity travels in China (CNT), intercity travels in the US (UST) and intercity travels in Belgium (BLT). The CNT data set is extracted from check-in records of the Sina Weibo website for users in mainland China 40 . The UST data set is extracted from check-in records of the Foursquare website for users in the continental US 63 . The BLT data set is extracted from check-in records of the website Gowalla for users in Belgium 64 . These data sets contain each user's spatial and temporal information, from which we can obtain the user's location. All the check-in locations within a city are regarded as the same with an identical zone label.
(6) Intracity trips. The intracity trips data sets include intracity trips in Suzhou (SZT), intracity trips in Beijing (BJT), intracity trips in Shenzhen (SHT), intracity trips in London (LOT) and intracity trips in Berlin (BET). The SZT data set is extracted from the mobile phone call detail records in Suzhou, a city of China. The data contains the time and positions of users making phone calls or sending text messages. The BJT data set 65 and the SHT data set 65 are extracted from the travel records of taxi passengers in Beijing and Shenzhen, respectively. When a passenger gets on or gets off a taxi, the coordinates and time are recorded automatically by a GPS-based device installed in the taxi. The LOT data set 64 and the BET data 64 set are extracted from checkin records at Gowalla in London and Berlin. Because of the absence of natural partitions in cities (in contrast to states or counties), the city is divided into zones, each of which is 1 km × 1 km (for SZT is 0.01 longitude × 0.01 latitude). All the locations within a zone are regarded as the same with an identical zone label 38 .
Normalized entropy. We use normalized entropy to reflect the heterogeneity of individual destination selection where E i is the normalized entropy of location i, p ij is the probability that the individual at location i chooses location j as his/her destination, and N is the number of locations.
Sørensen similarity index. The Sørensen similarity index 66 is a similarity measure between two samples. Here, we apply a modified version 38  where N is the number of locations, T ij is the predicted flux from location i to j and ′ T ij is the empirical flux. Obviously, if each T ij is equal to ′ T ij the index is 1, and if all T ij are far from the real values, the index is close to 0.

Data availability
Data available on request from the authors.