Scaling Law of Urban Ride Sharing

Sharing rides could drastically improve the efficiency of car and taxi transportation. Unleashing such potential, however, requires understanding how urban parameters affect the fraction of individual trips that can be shared, a quantity that we call shareability. Using data on millions of taxi trips in New York City, San Francisco, Singapore, and Vienna, we compute the shareability curves for each city, and find that a natural rescaling collapses them onto a single, universal curve. We explain this scaling law theoretically with a simple model that predicts the potential for ride sharing in any city, using a few basic urban quantities and no adjustable parameters. Accurate extrapolations of this type will help planners, transportation companies, and society at large to shape a sustainable path for urban growth.

against λ for New York City 16 closely resembles a "fast" saturation process, with a quick increase from lowest density, where shareability is minimal, to saturation where all trips can be shared. Our first main finding is that three other cities -San Francisco, Singapore, and Vienna (see Methods and Supplementary Information: Table S1a for datasets description and algorithms 17 ) -show strikingly similar shareability curves ( Fig. 1(b)-(d)). Such a similarity is remarkable, given that the shareability curves are obtained from data sets of real taxi trips, using a methodology that includes the hour-by-hour variability in traffic congestion (see Methods).
Each curve in Fig. 1 saturates rapidly as a function of λ. Their rapid saturation distinguishes them from other saturation phenomena observed in urban/geographical processes, such as the growth of retail locations 18 and the spreading of innovations 19 , which are instead characterized by an initial "slow start" phase with a sigmoidal shape. Fast saturation of shareability is a plausible explanation for the great success of innovative ride and vehicle sharing apps such as UberPool TM , ZipCar TM , and Car2Go TM .
The similarity we observe between cities actually goes beyond the resemblance of their shareability curves: a single linear rescaling of the λ-axis makes all the curves nearly coincident ( Fig. 2; see also Methods and Supplementary Information: Table S2), suggesting that a common mechanism governs shareability in those four cities. The data collapse is achieved by replotting the computed shareability S versus the dimensionless quantity The greater the L, the greater the shareability. The quality of the data collapse indicates that a few urban level parameters, when combined into the dimensionless group L, suffice to accurately model a complex quantity like the fraction of trips that can be shared in a city. This result is all the more surprising when one considers that L is defined in terms of the average daily traffic speed in the city, while the shareability curves have been derived using hourly, street-level traffic speed estimations. Evidently, the variability of congestion occurring at different times of day has a limited effect on our predictions. We do not ignore that variability; on the contrary, it is captured by the data we use to derive the shareability curves. Yet the fact that our simple model accounts so well for those shareability curves demonstrates that the variability is not a dominant effect. The universal curve can be explained by using a single, average value of the traffic velocity v.
The particular combination of urban parameters in Eq. (1) can be rationalized by dimensional analysis. Intuitively, L represents a ratio between two timescales: the sharing delay ∆ and the characteristic waiting time t wait for a trip to be generated in a user's vicinity. To see this, imagine that you are looking for a cab. Since λ is the average rate at which taxi trips are generated, 1/λ is the characteristic time for a new trip to be generated, somewhere in the city. But the city as a whole is not what concerns you. What matters more is how long you can expect to wait for a new trip to be generated in your vicinity. The characteristic linear scale of a vicinity is vΔ , the distance a cab moving at speed v would travel in the delay time Δ that another passenger could tolerate. Since the city has a total area |Ω| and each vicinity has area (vΔ ) 2 , there are about |Ω|/(vΔ ) 2 vicinities in total. Assuming that trips are generated uniformly in space, you would expect a trip to be generated in your vicinity every At a more refined level, the influence of urban parameters on shareability can be approached mathematically as follows. Intuitively, one expects that shareability should be positively related to ∆, v(C), and λ. Indeed, as ∆ and v(C) increase, people become more tolerant about sharing delay and a larger urban space can be covered without exceeding the delay 16 . The effect of increasing trip density is more complex to assess since it simultaneously introduces new rides and new ride-sharing opportunities. However, the additional trips are drawn from the same distribution as the original ones, so they possess similar spatiotemporal properties, which on average results in an increase of shareability as a function of λ.
Assuming that rides are generated independently in the city according to a given spatiotemporal distribution, we wish to compute the probability that a ride can be shared as a function of ∆, v(C), and λ. Tackling this problem directly is very difficult, since the probability of actually sharing a ride depends not only on the spatiotemporal availability of candidate trips to share, but also on how potentially shareable trips are paired together, which in turn depends on complex structural properties of the underlying shareability network. Nevertheless, the spatial dimension of the problem, coupled with the observed fast saturation of the shareability curve, suggest analogies with geometric random graphs 20 and percolation theory 21 . A common trait of these theories is that complex network structural properties such as connectivity can be closely approximated by much simpler properties, such as the existence of isolated nodes. This turns out to be the case also for shareability networks; we find that shareability S is highly correlated with the number of isolated nodes in the shareability network (Methods).
Based on the above discussion, we can model shareability by fixing an arbitrary trip T and estimating the probability that there exists at least one other trip T′ shareable with T. More specifically, an arbitrary trip T starting at time t 0 and going from origin o to destination d defines a trajectory in space and time. For fixed ∆ and average traffic speed v(C), we define the notion of the shareability shadow s(T) surrounding T and confining the region of sharing opportunities (Supplementary Information: Figs S1 and S2). For another trip T′ to be shareable with T, its trajectory needs to overlap (i.e., to take place at the same time, at least partially) and to be "aligned" (i.e., not deviate too much direction-wise) with s(T). Those two conditions simply translate our upper bound ∆ on delays into a geometric condition stating that shareable trips should be close enough in terms of trajectories, where close enough is quantified through the volume of s(T) chosen depending on v(C) and ∆. Analytically, the expected shareability becomes the probability that a compatible trip will be generated in the shareability shadow (see Supplementary Information: "Supplementary Equations"). To compute that quantity, the previously mentioned spatiotemporal distribution of trips has to be determined. Among the different options we considered, the following one gave the best compromise between accuracy and tractability: origin point o chosen uniformly in Ω, and destination point d chosen uniformly in a disk centered on o of radius R (ignoring boundary effects for the sake of simplicity). The geometry of the city plays a minimal part in the definition, which allows us to derive analytical formulas for the shareability. For R large enough, we find that S becomes independent of R, and the city's influence on the shareability only appears through the quantity L. We prove that (see Supplementary L L 3 2 We tested our model predictions on the four cities mentioned above and found a strong agreement with the respective shareability curves ( Supplementary Information: Fig. S3), with R 2 values ranging from 0.91 to 0.98.

Discussion
Our main contribution is the discovery of a unifying mathematical law that governs the potential for ride sharing in cities of diverse sizes and traffic characteristics. We have also proposed a simple model that accounts for the law, and does so with no adjustable parameters. The fidelity of the model suggests that the mechanisms governing ride sharing, in a real-world scenario where trips are performed on a road network with traffic congestion, can be accurately characterized by a model built on such simplifying assumptions as Euclidean geometry, straight-line trajectories, and extremely basic shareability shadow shapes. In particular, most of the knowledge required to determine shareability is contained in the dimensionless group L. Being relatively easy to estimate, and the only quantity required for our (otherwise parameter-free) framework, L gives the model strong predictive power.
A final important feature of the framework is its flexibility, which allows for potential enhancements. Relaxing some of the model's underlying assumptions might produce even greater accuracy. In particular, the shape of the shareability shadows could be modeled using conics. Or more realistic origin-destination patterns could be considered, but this would be done at the expense of closed-form formulas for shareability; the four-dimensional integral representing it (Supplementary Information: "Supplementary Equations") would then require stochastic approximations to be computed (using the VEGAS algorithm, for instance 22 ). On a potentially more impactful note, if the exact effect of congestion on average speed were known, through modeling or pervasive sensors, the model could be extended to take the following second-order effect into account: ride sharing reduces congestion and increases average travel speed, thereby increasing shareability (see Supplementary Information: "Supplementary Equations").
The in-depth study of the cities and their curves ( Fig. 2 and S3) shows a few interesting features. New York City cabs live up to their reputation with a shareability curve reaching over 99% compared to ~97% for the other three cities. The very high and homogeneous density of people in Manhattan (the only borough of New York City included in our study) might explain this difference between cities. For San Francisco, Singapore and Vienna, the entire city, including more sparsely populated areas, was considered. It is possible that those areas' "outlier" trips create small discrepancies between shareability curves. Large uninhabited places (e.g., Lainzer-Tiergarten in Vienna, and Singapore's central water catchment) were taken into account while computing the cities' areas, which might also explain certain differences between them.
Our findings quantify the effects of ride sharing on the urban environment and shed light on the recent upheaval that ride sharing has caused in cities worldwide. Furthermore, they offer valuable guidance towards designing more efficient mobility systems in the future. Tables 1 and 2 show the urban parameters and corresponding predictions for ride shareability in several major world cities. Even for low trip density, and allowing delays no longer than Δ = 5 minutes, the potential for sharing is massive.

Methods
The New York dataset has been obtained from the New York Taxi and Limousine Commission for the year 2011 via a Freedom of Information Act request. It is the same as the dataset used in refs 16 and 17. The San Francisco dataset is freely available 23 . The Vienna and Singapore datasets were provided to the MIT SENSEable City Lab by AIT and the Singapore government, respectively.
The New York dataset spans more than an entire year and contains all taxi trips generated in the area of New York by its approximately 13,500 taxis. The other datasets span roughly over a month and contain records provided by a single taxi operator. The total number of cabs in San Francisco is officially 1,494 24 , and the number of taxis tracked in the data set is about 500. For Singapore, the official figure is 25,176 25 , and our dataset refers to   Table S1. We have applied the same filtering procedure to all the datasets: only trips performed while a customer occupied the taxi were considered in the analysis. From these trips, we only kept the ones with start and end GPS positions within 200 meters of the closest intersection present in the considered area of study. Such an area was obtained by considering the borough of Manhattan (NY), the entire island of Singapore (SI), and both the urban areas of San Francisco (SF) and Vienna (VI), including the road to the airport. We included the airports of San Francisco, Singapore, and Vienna since trips to and from them account for a substantial fraction of the dataset.
The intersections were obtained from Open Street Map 26 (see Supplementary Information, Fig. S4), considering only primary and secondary level roads and by manually merging all repeated elements corresponding to every given intersection (using GQIS 27 ). All trip coordinates were provided in longitude-latitude pairs using the WGS84 ellipsoid but have been projected to Euclidean UTM coordinates using the zones specified in the Supplementary Information, Table S1a.
After pre-processing, each trip is uniquely identified by a tuple containing starting and ending (latitude, longitude) coordinates, which correspond to the coordinates of the intersections closest to start and ending coordinates of a trip, and by a pickup and dropoff time. Pickup and dropoff times are used to estimate travel times between any two intersections in the city for each of the 24 hours, according to the procedure described in ref. 16. This method allows accounting for the effect of traffic on travel time when computing the shareability networks used to obtain the shareability curves shown in the paper. Shareability networks for the four cities were obtained using the method described in ref. 16.
To generate the saturation curves used in the paper, two procedures were required. For the New York, San Francisco and Singapore datasets, for which the shareability curves are saturated, the lower parts of the curves (corresponding to small trip densities λ) were obtained by randomly and uniformly subsampling the database of actual trips up to the desired density. For the Vienna case, a second procedure was necessary to reach densities higher than those in the dataset (explaining why Vienna curves show λ and L values larger than the λ f and L(C) from Table S1b reported in the Supplementary Information). We call that procedure supersampling, and extend an existing method 17 .
The above procedure is used to interpolate trips from a given sample in a static manner in time. It is based on inferring a city's invariant collection of transition probabilities {p ij }, where ij enumerates all possible intersection pairs. Such a collection of values is normalized (Σ ij p ij = 1) and represents the probability that a given trip is generated at intersection i and ends at intersection j. Such a collection is shown 17 to be extremely stable in time, and a procedure is developed to infer the complete set of values (note that in general p ij ≠ 0 for all i and j). Once this collection of values is obtained, for a given density (total number of trips T trips generated in a given timespan τ) the allocation of trips to each intersection pair ij reads < t ij > = T trips p ij with t ij an integer random variable following a Poisson distribution.
We have extended the method of ref. 7 to allow for dynamic supersampling in time. Algorithm 1 (see Supplementary Information, Algorithm S1) exploits the exponential nature of inter-events times between trips (see Supplementary Information, Fig. S5) coupled with the statistics of daily and hourly trip generation (see Supplementary Information, Fig. S6). For every day, the algorithm distributes the empirical number of daily generated trips T d over hourly intervals according to the empirical probability q h = Ť h /Σ ĥ Ť ĥ (where Ť h is the average number of trips observed during hour h and 0 ≤ h, ĥ ≤ 23) and then distributes the generated trips over intersections according to p ij . Finally, for each intersection, the number of allocated trips is distributed in time according to a Poisson process. The code for this extension was made public 28 .