The cost of non-coordination in urban on-demand mobility

Over the last 10 years, ride-hailing companies (such as Uber and Grab) have proliferated in cities around the world. While generally beneficial from an economic viewpoint, having a plurality of operators that serve a given demand for point-to-point trips might induce traffic inefficiencies due to the lack of coordination between operators when serving trips. In fact, the efficiency of vehicle fleet management depends, among other things, density of the demand in the city, and in this sense having multiple operators in the market can be seen as a disadvantage. There is thus a tension between having a plurality of operators in the market, and the overall traffic efficiency. To this date, there is no systematic analysis of this trade-off, which is fundamental to design the best future urban mobility landscape. In this paper, we present the first systematic, data-driven characterization of the cost of non-coordination in urban on-demand mobility markets by proposing a simple, yet realistic, model. This model uses trip density and average traffic speed in a city as its input, and provides an accurate estimate of the additional number of vehicles that should circulate due to the lack of coordination between operators—the cost of non-coordination. We plot such cost across different cities—Singapore, New York (limited to the borough of Manhattan in this work), San Francisco, Vienna and Curitiba—and show that due to non-coordination, each additional operator in the market can increase the total number of circulating vehicles by up to 67%. Our findings could support city policy makers to make data supported decisions when regulating urban on-demand mobility markets in their cities. At the same time, our results outline the need of a more proactive government participation and the need for new, innovative solutions that would enable a better coordination of on-demand mobility operators.

. General information about the characteristics of different datasets.

Calculating a typical week
For for each workday (i.e. Monday to Friday) in each city, we calculated the average number of trips. Then, we went through our data, and for each week we calculated the total difference between that week and the calculated average. We sorted weeks in a descending order and chose the first three weeks which we then call typical weeks.

Materials and methods
The methodology used in this paper presents a simple, but a realistic approximation of ride-hailing operations. The included cruising step ensures that the distribution of idle vehicles asymptotically follows the distribution of demand in the city. Note: although our simulation assumes that drivers are on the road for a whole day (i.e. a 24-hour period), it could be easily extended to consider shorter shifts, and changes of drivers at given times of day. Furthermore, since trip demand is typically not constant during the day, we expect that different times of day could require different effective fleet sizes. A further extension of our work could consider "covering" the day with fixed-length (e.g. 8 hour) driver shifts in an optimal way to serve at least N min trips. We consider this analysis however to be beyond the scope of the current work. An alternative approach, which could give a more concrete understanding of specific fleet needs in a city, could aim at finding the fleet size that gives a sufficient performance given a specific sequence of trips. This way, any real data can be thought of as training data instead of test data. Nevertheless, if the strategy is generic (i.e. it is not influenced by daily variation of demand and is trained on a sufficiently large set of days), the result can still provide a good understanding of the possibility to serve trips with a given efficiency. Assuming this approach, after defining the strategies for fleet management, a binary search was used to find the minimum fleet size that provided an adequate level of service given a concrete set of trips. The whole procedure of finding the minimum fleet size to serve the given demand is outlined in Algorithm 1.
Algorithm 1 Basic binary search algorithm to find the minimum fleet given a dispatching strategy, a set of trips and target measures z and T wait . Since trip requests arrive in an arbitrary manner, at any time, we need to have a set of available drivers ready to be assigned to any new request. Specifically, if we would like to ensure that trip requests can be served within T wait maximum waiting time with a high probability, then either of the following needs to be true: (1) there should be trips already happening that will finish within T wait and in a location that allows the driver to reach the new passenger in a short time; or (2) there should be idle drivers distributed in the city in a way that the new passenger's location can be reached by at least one driver within T wait . The first possibility primarily depends on the origin-destination distribution of the trips; operators' dispatching choices will only influence to what extent any optimisation opportunities are exploited, but cannot result in more efficient operations than what is allowed by the structure of trip demand in the city. The second possibility essentially corresponds to deploying a "standby fleet" to cover areas of the city where imbalances in trip origins and destinations would result in requests being unserved. The size of this is primarily determined by the geometry of the city, the traffic speed, and the structure, but not the density of demand, thus we represent this by the constant term B. In general, we expect B to scale linearly with city area, i.e. B = b|C | for city C .
Given the relationship between the number of trips and the fleet size from Eq. (2) from the main text, we can derive a simple analytical expression for the fleet size factor described by Eq. (1): where n T is the average trip density, i.e. N T = n T |C |. By dividing everything by the city size |C |, in the end we use b as the size-independent scaled version of the constant B from Eq. (2).

Generating trips based on a random model
Beside the real trips, we further use the model presented in 3,4 to generate trips in a random process. This allows us to generate trips with different presumed densities, essentially allowing us to ask the question "What if we start from a larger number of initial trips?" or "What if there is a larger demand pool originally?". This also allows us to test more thoroughly how the factors in the previous model depend on the density of trips.
For each city, we generated 9 synthetic datasets, using the distribution of trips inferred from the real data; for each synthetic dataset, we generated 15 days of trips, similarly to the real data. This means that for each city, we had a total of 10 datasets (one original and 9 synthetic). For each dataset, we confirmed that fleet size factors can be modeled in the form presented as Eq. (3) in the main text, and calculated the best-fit D parameter. In Figs. S2, S3 and S4, we display these D values as a function of average trip distance, average trip duration, and average traffic speed respectively. Points are grouped by color based on trip density, while points with the same x-axis value correspond to simulations carried out in the same city (as we have 5 cities in the dataset, each of the figures has five distinct values on the x-axis).

Effect of peak daily trip rate
Our main analysis considers the number of trips in a day as the main starting point for modeling fleet size requirements and the effect of market segmentation. While this presents a simple and straightforward measure that can be easily applied to any city, an additional complexity comes from the fact that the distribution of demand in a day is not uniform. This means that the required fleet size will likely be affected by the peak demand in a day. To assess the importance of this, we calculated a measure of "peak demand" for each day in our datasets as the hourly maximum number of trips, and a measure of "peak utilization" as the maximum of hourly total trip duration. We display the estimated fleet size as a function of these variables in Figs. S6 and S7. We see that this dependence can be well approximated by a linear relationship as well, with significant variation among the coefficients among the cities.  Figure S4. Constant D as a function of the average travel speed in the synthetic datasets. Each set of points belongs to a set of results with a given density, displayed in the legend as trips per day per km 2 . What we can observe is that the constant D decreases monotonically with an increase of the traffic speed.   Figure S8. Linear fit of fleet size as a function of daily trip numbers in the Oracle model. Note that the fleet sizes displayed in this figure are those required for serving all trips without any delay and thus can be larger than fleet sizes in the FCFS model which are only required to serve 95% of trips with a maximum delay of 5 min. Extension of the oracle model to select an "ideal" subset of trips or to allow flexible delays for trip start times leads to a prohibitive increase in combinatorial complexity of the problem 5 .