Introduction

Car parking is of central importance to congestion on roads and their social, environmental and economic impacts on society. An imbalance between on- and off-street parking prices for instance leads to cruising for cheaper on-street parking lots which in turn is responsible for 8–74% of traffic in downtown areas1. If we consider that cars that burn fuels other than electricity are directly responsible for 14% of global greenhouse gas emissions2 and 3.3 million premature annual deaths worldwide3, we find that there exist large potentials for environmental savings. Knowing where cars are parked at what times could support urban planning in the process of optimizing road networks, pricing parking lots and planning land use.

The electrification of the mobility sector makes it even more important to know where cars are parked at what times. At times and locations where large numbers of cars are parked with high density and must charge simultaneously for their upcoming trips, their additional electricity consumption can cause stresses to the local grid4. On the other side, the batteries of parked electric cars can be valuable storage capacities that can be used for balancing grid operation with large shares of intermittent renewable energy sources5,6,7. For an efficient placement, sizing and grid connection of charging stations, it is therefore important to know the spatio-temporal distribution of car parking densities on the scale of entire cities8,9,10.

The existing literature on urban traffic does not provide data on spatio-temporal parking densities on the scale of entire cities. Classic traffic system research aims at the development of optimal transport networks by minimizing congestion on roads11,12. The existing theories focus on the stream variables speed, flow and concentration of vehicles13,14,15,16; in these, density always refers to the concentration of vehicles on roads. Car parking density maps instead would require data about the number of cars parked in each region of a city at various times of a day. We find that the main barriers for measuring such data directly are the costs and efforts of placing and operating sensors.

In this analysis, we explore if car parking density maps on the scale of entire cities can be estimated from travel time measurements among different zones of a city only. We formulate a Hidden Markov Model in which states are the locations of cars and emission measurements are the changing travel times among the zones of that city throughout a day. We apply the model to travel time data that is measured from undertaken Uber rides17 and generate the desired parking density maps for 34 cities around the globe. We further provide complete code and instructions on extending the presented model for a wider range of traffic and parking system analyses18.

Results

We generate car parking density maps from Uber travel time data for 34 cities worldwide19. We formulate non-linear functional relationships between the changing average travel times among the zones of a city and both the traffic activity and flow direction probabilities of cars. We derive origin-destination matrices for each hour of the day and for all zones of an entire town with these functional relationships. We then sample the traffic flow for 1,000 cars per city zone for each city from the resulting probability distributions and normalize the resulting spatial distribution of cars in each time step. The resulting parking density maps are hence independent from the number of sampled cars and can be scaled to arbitrary vehicle fleet sizes.

The resolution of the generated maps in time is one hour. The time periods for which we generate the results depend on the availability of the underlying Uber travel time measurements. At the time of writing this, Uber provides travel time data for the years 2015–2018 and distinguishes these by weekdays, weekends and the quarter of a year. In addition to these, Uber also provides travel time statistics that are collected regardless of the day type; these datasets cover a larger variety of trips than the separated datasets and therefore have lower sparsity.

The resolution of the generated maps in space varies and depends on how Uber divides cities into different zones. Figure 1 shows the different scales at which the generated parking maps can be used. The parking maps can be used for an analysis of the entire suburban area of a town (Fig. 1a,b) or the center of a town (Fig. 1c); the accuracy is also adequate for an analysis of parking densities within the city center of a town (Fig. 1d).

Fig. 1
figure 1

A scatter plot of the generated car parking densities for Melbourne on weekdays of the third quarter of 2017 (9:00 am) that shows the different spatial scales at which the results can be analyzed. The size of a blue dot indicates the density of parked cars in the respective city zone. The zoom factor increases within the marked black windows (zoom windows) from (a) to (d).

We subsequently validate the presented results in two steps for the city of Melbourne during the years 2015–2017 for both travel time data that is separated by day types and those that are not separated by day types. In a first step, we validate the circadian rhythm of traffic activity that we generate by our model. In a second step, we validate the car parking densities that result from our traffic flow model for 100–105 city zones; the city zones that we validate are given by the location of the operated underground parking sensors in Melbourne. Our choice of Melbourne for validation is arbitrary and motivated by the availability of measured traffic count and parking density data; we make this choice independent from the performance of our model.

Validation of traffic activities

A first substantial hypothesis in our model is, that whether a car drives to another city zone or stays parked in its origin zone is a function of changing travel times from the car’s current location to all possible destination zones throughout a day. This means, that the higher the measured travel times at a certain time of a day in a particular city zone are, the more likely it is that a car will undertake a trip to another destination at that given time of the day. This generates a characteristic circadian rhythm of traffic activity for each sampled day. We validate our sampled circadian rhythms with vehicle count data from street segments in the city of Melbourne that we use as an indicator for traffic activity20.

Table 1 contains the numeric results of the validation for the years 2015 until 2017. The percentual fit between the modeled and the measured circadian rhythm is 94–99% for weekdays and 93-99% for weekends. We observe that the fit is higher for weekdays when using mean travel time data that is not separated by day types; these datasets have usually lower sparsity than those that are separated by day types. Except for the last three quarters of the year 2016, the percentual validation fit is larger for weekends if one uses the travel time data that is separated by day types than when using non-separated data.

Table 1 Validation of traffic activity.

Figure 2 visualizes the results exemplarily for the first quarter of the year 2017. We can see that our sampled traffic activity is shifted towards the weekday patterns when using travel time data that is not separated by day type and therefore deviates from the actually measured traffic patterns when validated for measurements on weekends (Fig. 2b). For all other datasets (Fig. 2a,c,d), the sampled and measured traffic activities match with very high accuracy. The circadian rhythm of urban traffic has four characteristic features that are similar for all larger cities around the world21,22,23,24,25: first, a relatively high morning rush hour traffic between 8:00–10:00 am; second, a moderate lunch time traffic between 11:00 am–1:00 pm; third, a peak evening rush hour traffic between 4:00–6:00 pm; fourth, a relatively low midnight traffic between 1:00–3:00 am. We can observe that both the measured and the sampled circadian rhythms contain the four characteristic features of urban traffic activity.

Fig. 2
figure 2

The measured (blue) and sampled (red) circadian rhythms of traffic activity for different datasets of Melbourne in the first quarter of the year 2017. (a) Weekdays with travel time data that is not separated by day type and a fit of 99%. (b) Weekends with travel time data that is not separated by day type and a fit of 95%. (c) Weekdays with travel time data that is separated by day type and a fit of 98%. (d) Weekends with travel time data that is separated by day type and a fit of 99%.

Validation of parking densities

The second substantial hypothesis in our model is that the destination zone that a car will choose for a trip is a function of the traffic activity in that zone at that given time of the day. This means, that the higher the measured travel times to a particular city zone are, the more likely it is that cars will choose that zone as a destination. This generates characteristic flow directions of cars and, together with the first hypothesis, the parking density of cars among the zones of a city. We validate the characteristics of our modeled car parking densities with underground parking sensor measurements in about 100 from the total of 2,357 modeled city zones in Melbourne20. The validated city zones are given by the location of the operated underground sensors which is mostly around the city center of Melbourne.

Table 2 contains the numeric results of the validation for the years 2015 until 2017. The percentual fit between sampled and measured parking densities is in a similar range for weekdays and weekends. We observe average fits of 82–92% for weekdays and 82–90% for weekends. The minimum fit is 64–83% for weekdays and 53–82% for weekends. The maximum fit is 94–97% for weekdays and 95–99% for weekends. The fit between the sampled and measured parking densities during weekdays is higher for the travel time data that is not separated by day type than for the separated data from 2015 until the first quarter of 2016. For the following time periods, the percentual fit is higher for the separated travel time datasets than for those that do not separate the data by day type. This turning point aligns with a larger decrease in the sparsity of the underlying travel time data for Melbourne from the first to the second quarter of 2016 (Table 3). During weekends, the percentual fit is always larger or equal for the non-separated datasets compared to the separated ones.

Table 2 Validation of parking densities.
Table 3 The sparsity of the currently provided Uber travel time datasets in percentage. Cities starting with A – M. The variable N indicates the total number of city zone polygons in which the respective city is divided.

Figure 3 further visualizes the validation results for an exemplar city zone during the fourth quarter of the year 2016 for both the non-separated (Fig. 3a,b) and the separated (Fig. 3c,d) datasets. We can observe that the patterns of parking density better match with the non-separated travel time data for both weekdays and weekends (Fig. 3a,b) than with the separated ones (Fig. 3c,d). In these, both the sampled (red) and measured (blue) parking densities rise, reach their peaks and decline at the same time. With the datasets that are separated by day types (Fig. 3c,d), our modeled (red) rise, peak and decline phases of parking density mismatch with the measured (blue) values.

Fig. 3
figure 3

The measured (blue) and sampled (red) parking densities for different datasets of Melbourne in the exemplar city zone 956 in the fourth quarter of the year 2016. (a) Weekdays with travel time data that is not separated by day type and fit of 97%. (b) Weekends with travel time data that is not separated by day type and fit of 94%. (c) Weekdays with travel time data that is separated by day type and fit of 90%. (d) Weekends with travel time data that is separated by day type and fit of 94%.

Discussion

We explore, if temporal car parking density maps can be modeled on the scale of an entire city, given travel time measurements between the zones of that city only. Using the Hidden Markov traffic Model that we present and the travel time data that is published by Uber, we find that parking densities can be estimated with about 90% accuracy for both weekdays and weekends. The circadian rhythm of daily commuting traffic can further be sampled with an accuracy of over 93% with our model. Although Uber users may not take an Uber ride for their daily commute but rather for extraordinary trips, which creates a large sparsity and bias in most of the data, we find that the measured travel time data is most of the time representative enough for creating the desired car parking density maps on the scale of entire cities.

The research question we ask is new and the data that we generate is the first of its kind in the literature. In contrast to classic traffic system research, our theory estimates the density of car parking by sampling individual driving behaviour. It contains elements of the Cumulative Vehicle Count Curves (N-curves)26 and Wardrop’s second principle of equilibrium27. Our theory, however, differs in the fundamental hypothesis that the density of cars parked in an area is a function of the changes in travel time between the origin and destination pairs that relate to this area throughout a day.

Our results are consistent with those of previously performed traffic system analyses21,22,23,24,25. The parking densities and commuting trends that we model based on our central hypotheses mostly match with those that are measured in the city of Melbourne20. We choose Melbourne for the validation of our results based on the parking and traffic data that is publicly available.

The performance of the presented model depends on the sparsity of the origin-destination travel time matrices that we use to sample the traffic flow of cars. We can observe that the travel time data that is collected without separation by weekdays and weekends has a lower sparsity and leads to more accurate parking density trends (Fig. 3a,b) than the travel time data that is separated by weekdays and weekends (Fig. 3c,d). For the modeled circadian rhythms of traffic activity, however, the opposite holds: the traffic activity features at weekends are better modeled with the travel time data that is separated by day types (Fig. 2d) than when using the non-separated data (Fig. 2b). One major reason is that the travel time statistics for the five weekdays have greater weight in the non-separated datasets than the two weekend days. However, this does not hold for the datasets of the year 2016 which could again be caused by biases that are given through the sparsity of the datasets. City zones can further behave as ever growing sinks if more cars flow into these than out of these; this is again caused by the sparsity in the underlying data. Further research can be done on reducing the sparsity of the underlying travel time data by using e.g. satellite imagery to estimate the missing travel time matrix entries of the Uber travel time data. This could reduce the ever growing sink characteristics of zones and further biases in the traffic flow that are given by the Uber data.

Figure 4 visualizes the generated parking density maps for three more cities at the times of their largest diversity. The diversity of the visualized parking maps implies the importance of our generated data for the electrification of the mobility sector: policies that are found to be effective for charging electric cars in one city can be useless in other cities due to different patterns of car parking. On the other side, our data confirms that also generally applicable policies exist: the commuting behaviour of car users creates high parking densities at commercial centers during working hours. This always includes the times of peak solar power generation during midday at which grid balancing services of electric car batteries could be most useful.

Fig. 4
figure 4

The parking density maps of Boston, Bogota and London at the times of their largest diversities. Each dot represents a city zone and the size of each dot represents the number of cars parked in the respective city zone at the given time. The maps on the left column represent parking at 4:00 am (a,c,e). The maps on the right column represent parking at 9:00 am (b,d,f).

Methods

We are given the arithmetic mean of hourly travel time measurements between different zones of a city (Fig. 5a) and want to estimate the traffic flow and spatial parking distribution of cars in that city. In a first stage, we estimate the probabilities of car traffic between zones as a function of mean travel times (Fig. 5b). These probabilities exploit the changes in mean travel time between the zones of the city throughout a day to approximate information about when cars would drive and where they would drive to. In a second stage, we sample individual car traffic from these probability distributions and determine the number of cars that are parked in a zone as a function of cars flowing in and out from that zone (Fig. 5c). In a third and last stage, we use validation results to tune the parameters of the probability distributions that we sample from (Fig. 5d). For each model parameter or each set of model parameters that we can freely choose, a set point value is used to make a good choice. A set point value can for instance be the evaluation error between sampled and measured parking densities and traffic activity, or the average number of trips and travel distances.

Fig. 5
figure 5

The core stages of the modeling method. (a) The city of interest (here e.g. Melbourne) is divided into zones for which we are given mean travel time data for a subset of these zones and day times. (b) A graph representation of the city zones. Each node stands for a zone and each value incorporated in the node stands for the probability that a car located in that zone drives. The probabilities of choosing a certain destination zone are assigned to each latency of the graph. The flow of cars between city zones is sampled from the joint probabilities of driving and choosing a destination zone. (c) The distribution of cars parked in each zone in each time step is calculated as a function of traffic flow between all zones. (d) A set of gradient descent algorithms performs model selection by tuning the open model parameters.

Uber travel time data

The Uber Movement project provides statistical data about travel times between different zones of a city. At the time of writing this, data is available for 34 cities worldwide with one second resolution. The data distinguishes between each quarter of several years. Depending on the city of interest, data is available from the years 2015 until 2018. For each quarter of these years, three types of datasets are available. The first type of dataset contains the aggregated measurements of travel time during weekdays and weekends. The second type of dataset contains measurements for weekdays only, and the third type of dataset contains measurements of weekends only. Detailed information on how the statistics were derived can be retrieved from Uber’s official methodology paper28. The data includes both central and suburban regions of a city which are divided into up to 5,260 zones. The raw travel data, as it is published by Uber, consists of four entry types. We let N be the number of zones in which the city is divided and T be the number of discrete time steps in which one day is divided. The four entry types of the travel data can then be defined for all t = 1…T and i, j = 1…N as:

  • μij,t: = the mean travel time from zone i to zone j at daytime t

  • σij,t: = the standard deviation of travel time from zone i to zone j at daytime t

  • geo_μij,t: = the geometric mean travel time from zone i to zone j at daytime t

  • geo_σij,t: = the geometric standard deviation of travel time from zone i to zone j at daytime t

The sparsity of data

The sparsity of the datasets plays an important role for the performance of the here presented model. These sparsities depend on the number of Uber users in the provided cities and the provided time periods. Measurements are only given between zones and at times, in which a sufficient amount of Uber rides were undertaken, so as to ensure a sufficiently good representation of overall traffic28. This naturally generates a bias in the data that is larger the less Uber rides were undertaken in the time period and city of interest. Tables 3 and 4 give an overview of the sparsity of the currently provided datasets. We can see that the sparsity is generally lower for the datasets that contain travel time measurements for both day types. We further see that the sparsity of the datasets fluctuates but generally decreases over time as the user base of Uber grows and more rides are measured between city zones at various times of the day. We define the sparsity of a dataset as one minus the available number of data pairs divided by the maximum possible number of data pairs.

Table 4 Continuation of the sparsity of the currently provided Uber travel time datasets in percentage. Cities starting with N – W. The variable N indicates the total number of city zone polygons in which the respective city is divided.

City zones and their distances

Each zone of a city is described by a polygon with up to several hundred vertices. A separate file provides the latitudinal and longitudinal coordinates of these vertices for each polygon of each city. We let Si be the number of vertices of the polygon of zone i, and xi,j and yi,j be the longitudinal and latitudinal coordinates of vertex j. We represent each city zone polygon by the centroid, also known as the center of gravity, of this polygon. We compute the area Ai of each city zone i and the coordinates of the i-th centroid (longi|lati) for all i = 1…N and j = 1…Si, where j = 0 represents the same vertex as j = Si, as:

$${A}_{i}=\frac{1}{2}\cdot \mathop{\sum }\limits_{j=0}^{{S}_{i}}\,({x}_{i,j}\cdot {y}_{i,j+1}-{x}_{i,j+1}\cdot {y}_{i,j})$$
(1)
$$lon{g}_{i}=-\frac{1}{6{A}_{i}}\cdot \mathop{\sum }\limits_{j=0}^{{S}_{i}}\,({x}_{i,j}+{x}_{i,j+1})\cdot ({x}_{i,j}\cdot {y}_{i,j+1}-{x}_{i,j+1}\cdot {y}_{i,j})$$
(2)
$$la{t}_{i}=-\frac{1}{6{A}_{i}}\cdot \mathop{\sum }\limits_{j=0}^{{S}_{i}}\,({y}_{i,j}+{y}_{i,j+1})\cdot ({x}_{i,j}\cdot {y}_{i,j+1}-{x}_{i,j+1}\cdot {y}_{i,j})$$
(3)

The easiest way to calculate the distance between two points on the planet is to assume that latitude and longitude are straight lines. Using a constant distance of 111.3 km between each latitudinal degree and an average distance of 71.5 km per longitudinal degree, we can simply calculate the distances with the Pythagorean theorem. A more accurate way of doing this is to consider that longitudinal lines are not simply straight but rather bend depending on the latitudinal position: the distance between two longitudinal lines is around 111.3 km at the equator and 0 at the north and south poles. The dependence on the latitudinal position lat can then be expressed as 111.3 km cos(lat). If we replace the average distance of 71.5 km per longitudinal degree with this expression within the Pythagorean theorem, we can calculate the distance dij between two zones i and j at the points (longi|lati) and (longj|latj) as:

$${d}_{ij}={d}_{ji}=111.3\,km\cdot \sqrt{co{s}^{2}\left(\frac{la{t}_{i}+la{t}_{j}}{2}\right)\cdot {(lon{g}_{i}-lon{g}_{j})}^{2}+{(la{t}_{i}-la{t}_{j})}^{2}}$$
(4)

Hidden markov traffic Model

We define the state of the traffic system as the distribution of cars among city zones. The state in each time step belongs to a finite set of possible states if the number of sampled cars is finite and constant. We can extend the state space of our traffic model by assigning more properties to a car than just its location. Such properties can for instance be the states of charge of electric car batteries or the number of transported persons. We describe the location of a car at a given time t = 1…T by its city zone and introduce:

  • Xt: = the distribution of cars among all city zones at time step t

If we let V be the number of cars that we want to simulate, the state space ΩX contains NV possible arrangements. Every state Xt is only dependent on its previous state Xt−1 which gives us a discrete-time Markov chain model. For the state transition probabilities at times t = 1…T − 1 it hence holds that:

$$P({X}_{t+1}| {X}_{t},{X}_{t-1},...,{X}_{1})=P({X}_{t+1}| {X}_{t})$$
(5)

We let the state transition of our model be given by the uncertain flow of cars among all city zones and denote this as \({\varepsilon }_{t}\). The number of possible state transitions is hence (NV)2. For each time step t = 1…T-1, we write:

$${X}_{t+1}={X}_{t}+{\varepsilon }_{t}$$
(6)

The transition probabilities can then be described by:

$$P({X}_{t+1}| {X}_{t})=P({X}_{t+1}-{X}_{t})=P({\varepsilon }_{t})$$
(7)

The problem that we want to solve can be formulated as finding the car flow probabilities \(P({\varepsilon }_{t})\) given the mean travel times μij,t only. This gives us a Hidden Markov Model as the number of cars flowing among city zones is not directly measured but derived as a function of the changing mean travel time throughout the day. The mean travel times are called emission measurements.

Probabilities of driving and parking

For describing whether a car drives to another zone or stays parked in the same zone, we introduce a binary random variable:

  • A: = binary random variable that describes whether a car drives or parks

    We denote the values that a random variable can assume in lower cased letters and let 0 stand for parking and 1 for driving, hence a Ωa = {0, 1}. We determine the probabilities for possible values that A can take by letting the driving activity of cars in a zone be a function of the sum of mean travel times from that zone to all other zones. We assign a probability of driving according to the sum of mean travel time out of a zone in each hour compared to its minimum and maximum values throughout the entire day. For this purpose, we introduce two parameters:

  • pmin: = the probability that a car will drive, given the minimum sum of mean travel time out of a zone

  • pmax: = the probability that a car will drive, given the maximum sum of mean travel time out of a zone

One can assign arbitrary initial values to pmin and pmax and perform a model selection based on validation results to make a good choice of these parameters. We later present an algorithm that utilizes validation results of average driving times for parameter tuning. Two conditions that pmin and pmax must satisfy are:

$${p}_{max} > {p}_{min}$$
(8)
$${p}_{min},{p}_{max}\in [0,1]$$
(9)

For each zone of the city, we derive a set of 24 Binomial probability distributions, that is one for each hour of the day. We define the probabilities of an event P(A = a) for zones i = 1…N and times t = 1…T as:

$$P(A=1)={p}_{i,t}^{drive}=\left\{\begin{array}{ll}{p}_{min}+({p}_{max}-{p}_{min})\frac{{\sum }_{j=1}^{N}\,{\mu }_{ij,t}-mi{n}_{t}\{{\sum }_{j=1}^{N}\,{\mu }_{ij,t}\}}{ma{x}_{t}\left\{{\sum }_{j=1}^{N}\,{\mu }_{ij,t}\right\}-mi{n}_{t}\left\{{\sum }_{j=1}^{N}\,{\mu }_{ij,t}\right\}} & ma{x}_{t}\left\{\mathop{\sum }\limits_{j=1}^{N}\,{\mu }_{ij,t}\right\} > 0\\ 0 & ma{x}_{t}\left\{\mathop{\sum }\limits_{j=1}^{N}\,{\mu }_{ij,t}\right\}=0\end{array}\right.$$
(10)
$$P(A=0)=1-{p}_{i,t}^{drive}$$
(11)

The high sparsity of our available datasets can likely create a bias if data is missing more frequently at particular times and between particular zones of a city than at others. Considering the distance between the set of zones and the number of zones in each time step for which data points are available, can significantly decrease these biases. With Mij,t being the subset of travel time data that is available at time step t, a formulation of Eq. (10) that is more robust against sparsity is given by:

$$P(A=1)={p}_{i,t}^{drive}=\left\{\begin{array}{ll}{p}_{min}+({p}_{max}-{p}_{min})\frac{1}{| {M}_{ij,t}| }\frac{{\sum }_{j\in {M}_{ij,t}}\,\frac{{\mu }_{ij,t}}{{d}_{ij}}-mi{n}_{t}\left\{{\sum }_{j\in {M}_{ij,t}}\,\frac{{\mu }_{ij,t}}{{d}_{ij}}\right\}}{ma{x}_{t}\left\{{\sum }_{j\in {M}_{ij,t}}\,\frac{{\mu }_{ij,t}}{{d}_{ij}}\right\}-mi{n}_{t}\left\{{\sum }_{j\in {M}_{ij,t}}\,\frac{{\mu }_{ij,t}}{{d}_{ij}}\right\}} & | {M}_{ij,t}| > 0\\ 0 & | {M}_{ij,t}| =0\end{array}\right.$$
(12)

Probabilities of choosing a destination

For describing which destination zone a car would choose for a trip, we introduce a second random variable:

  • B: = discrete random variable that describes the destination zone that a car would choose

We also let the popularity of destination zones be a function of changing travel times throughout a day and ask ourselves how likely it is that a car in zone i travels to zone j at a given point t in time. For this purpose, we compare the mean travel time of an origin and destination pair in each hour with its minimum and maximum values throughout the entire day. We hence assign a numeric value between 0 and 1 to each origin and destination pair and for each direction. Note that we do not use the sums of mean travel times here but rather perform a min-max scaling of the data in each time step. The closer this value is to one, the more popular a destination zone is at that given hour. We generate a valid Multinomial probability distribution whose probabilities sum up to 1 for each zone and each hour of the day by normalizing all values with a factor Zi,t. The set of values that B can assume is given as b Ωb = {1, ..., N}. We define the probability P(B = b) that a car in zone i = 1…N chooses a destination zone j = 1…N at time t = 1…T and their respective normalization factors as:

$$P(B=j)={p}_{ij,t}^{dest}=\left\{\begin{array}{ll}\frac{1}{{Z}_{i,t}}\frac{{\mu }_{ij,t}-mi{n}_{t}\{{\mu }_{ij,t}\}}{ma{x}_{t}\{{\mu }_{ij,t}\}-mi{n}_{t}\{{\mu }_{ij,t}\}} & ma{x}_{t}\{{\mu }_{ij,t}\} > 0\\ 0 & ma{x}_{t}\{{m}_{ij,t}\}=0\end{array}\right.$$
(13)
$${Z}_{i,t}=\mathop{\sum }\limits_{j=1}^{N}\,{p}_{ij,t}^{dest}$$
(14)

Joint origin-destination probabilities

The desired transition probabilities \(P({\varepsilon }_{t})\) of the traffic flow can now be described by the joint events of driving and choosing a destination. For a simplified notation of these joint probabilities, we introduce a third random variable:

  • C: = discrete random variable that describes the joint events of driving (A = a) and choosing a destination (B = b)

We describe the state space of C with c ΩC = {0, ..., N}. A value of zero means that a car does not drive (A = 0) and hence stays parked in its origin zone. Any other value indicates that a car is chosen to drive (A = 1) and chooses the respective destination zone (B = 1…N) that is given by the value of C other than zero. For a simplified notation and calculation, we assume the random variables A and B to be independent and write:

$$P(A=a,B=b)=P(A=a)\cdot P(B=b)$$
(15)

Given the independence of A and B, we calculate the probabilities of cars in zone i = 1…N to drive to a destination zone j = 1…N or stay parked at time t as:

$$P(C=0)=P(A=0)=(1-{p}_{i,t}^{drive})\cdot {\underbrace{\mathop{\sum }\limits_{j=1}^{N}{p}_{ij,t}^{dest}}}_{=1(normalized)}=(1-{p}_{i,t}^{drive})\quad (Marginalization)$$
(16)
$$P(C=j)=P(A=1,B=j)={p}_{i,t}^{drive}\cdot {p}_{ij,t}^{dest}$$
(17)

Note that the same holds for both the arithmetic and geometric mean of travel time. We summarize that the uncertain flow of cars among the zones of a city \({\varepsilon }_{t}\) is modeled as a function of mean travel times μij,t and described by the probability distribution of P(C = c) as:

$$P({\varepsilon }_{t}) \sim {p}_{ij,t}^{joint}=\left\{\begin{array}{cc}1-{p}_{i,t}^{drive} & c=0\\ {p}_{i,t}^{drive}\cdot {p}_{ij,t}^{dest} & c > 0\end{array}\right.$$
(18)

Initial value problem

To sample the traffic flow, we need to define a realistic initial state of the system. The problem of finding such a state is commonly referred to as the Initial Value Problem. We introduce:

  • X0: = the initial distribution of cars among all city zones

We uniformly assign a location zone to each car and calculate the state transitions for an entire day. We expect the initial uniform distribution of cars to converge towards a naturally shaped distribution after all transition probabilities are applied to the traffic system. We then assume the last state of an entirely modeled traffic day as the solution to the Initial Value Problem and set:

$${X}_{0}\leftarrow {X}_{T}$$
(19)

State transition properties

To analyze any values of interest, we can assign state transition properties to each undertaken trip. Here, we let the transition properties of our model be the travel distance and duration of each trip. Similar to the state of the system, also the state transition properties can be extended. Such extensions can for instance be carbon emissions, fuel consumption and transported amount of people for each trip. We introduce for all v = 1…V, i, j = 1…N and t = 1…T:

  • \({T}_{v,ij,t}^{dur}\): = the duration of a trip undertaken by car v from zone i to zone j at time t

  • \({T}_{v,ij,t}^{dis}\): = the distance of a trip undertaken by car v from zone i to zone j at time t

As for many other natural processes, we assume the stochastic distribution of travel duration and distance to be Gaussian29. We use the mean (μij,t) and standard deviation (σij,t) from the original datasets to describe the stochastic distributions of \({T}_{v,ij,t}^{dur}\); the calculated distances between city zones (dij) are further used for describing \({T}_{v,ij,t}^{dis}\). We arbitrarily choose a standard deviation of 0.1 times the distance dij for creating a larger variety of individual trip distances. In order to avoid negative duration and distances, we truncate the probability distributions. We choose arbitrary lower and upper ranges of 0.1 times the mean and normalize the probability density functions accordingly. We let tdur and tdis describe random variables and formulate the truncated probability distributions of \({T}_{v,ij,t}^{dur}\) and \({T}_{v,ij,t}^{dis}\) as:

$${T}_{v,ij,t}^{dur} \sim \frac{\frac{1}{\sqrt{2\pi {\sigma }_{ij,t}^{2}}}\cdot exp-\frac{{({t}^{dur}-{\mu }_{ij,t})}^{2}}{2{\sigma }^{2}}}{\frac{1}{\sqrt{2\pi {\sigma }_{ij,t}^{2}}}\cdot {\int }_{-\infty }^{1.1{\mu }_{ij,t}}\,exp-\frac{{({t}^{dur}-{\mu }_{ij,t})}^{2}}{2{\sigma }^{2}}d{t}^{dur}-\frac{1}{\sqrt{2\pi {\sigma }_{ij,t}^{2}}}\cdot {\int }_{0.9{\mu }_{ij,t}}^{\infty }\,exp-\frac{{({t}^{dur}-{\mu }_{ij,t})}^{2}}{2{\sigma }^{2}}d{t}^{dur}}$$
(20)
$${T}_{v,ij,t}^{dis} \sim \frac{\frac{1}{\sqrt{2\pi {(0.1\cdot {d}_{ij})}^{2}}}\cdot exp-\frac{{({t}^{dis}-{d}_{ij})}^{2}}{2{(0.1\cdot {d}_{ij})}^{2}}}{\frac{1}{\sqrt{2\pi {(0.1\cdot {d}_{ij})}^{2}}}\cdot {\int }_{-\infty }^{1.1{d}_{ij}}\,exp-\frac{{({t}^{dis}-{d}_{ij})}^{2}}{2{(0.1\cdot {d}_{ij})}^{2}}d{t}^{dis}-\frac{1}{\sqrt{2\pi {(0.1\cdot {d}_{ij})}^{2}}}\cdot {\int }_{0.9{d}_{ij}}^{\infty }\,exp-\frac{{({t}^{dis}-{d}_{ij})}^{2}}{2{(0.1\cdot {d}_{ij})}^{2}}d{t}^{dis}}$$
(21)

Sampling

We sample the state transitions and therewith the traffic of cars among all city zones by generating a random variable between 0 and 1 for each car in each time step. We then compare the value of the random variable with one of the Multinomial distributions of \({p}_{ij,t}^{joint}\) and decide for each car in which zone it stays parked or to which zone it drives. Which Multinomial distribution we choose from \({p}_{ij,t}^{joint}\) depends on the location of each car, that is i, and the time step of simulation which is t. The distance and duration of trips are sampled by generating random variables from the truncated Gaussian distributions in (20) and (21) for each trip.

Model selection for e drive and e dest

The parameters edrive and edest determine the functional relationship between travel time data and the joint Origin-Destination probabilities of the modeled traffic system. In order to find a good choice for them, we must evaluate the values of interests in our model with real world observations. The values that we are mainly interested in in this modeling task are the circadian rhythm of traffic activity and the spatio-temporal distribution of car parking densities. We hence use measurements of these two values to make a good choice for edrive and edest. The subsequent subsections describe how the validation works in detail. In this subsection we describe an algorithm for finding a good solution for both parameters. Finding optimal values for edrive and edest turns out to be a non-convex optimization problem. Optimizing via gradient descent is hence strongly dependent on the initial parameter values. The first part of the optimization process is hence about finding a good starting point. In order to do this, we evaluate the results of traffic activity and parking density for several values of edrive and edest between zero and one, as well as between one and e.g. 10. Both edrive and edest are used as exponents in the functional relationship of Eq. (18). We stretch the probability distributions for larger values than one and clinch these for smaller values than one, compared to a linear functional relationship that is given for a value of one for edrive and edest. We let n be the n-th iteration of the optimization algorithm and introduce:

  • e(n): = current parameter

  • \({e}_{b}^{(n)}\): = best parameter until now

  • \({\rm{\nabla }}{e}^{(n)}={e}^{(n)}-{e}_{b}^{(n)}\): = current parameter gradient

  • E(n): = evaluation error with current parameter

  • \({E}_{b}^{(n)}\): = best evaluation error until now

  • \(\nabla {E}^{(n)}={E}^{(n)}-{E}_{b}^{(n)}\): = current error gradient

  • η(n) = η: = constant step size parameter

We test different initial parameters and set the one with the lowest evaluation error as the best found parameter \({e}_{b}^{(n)}\) with the best found evaluation error \({e}_{b}^{(n)}\) until now. In each iteration n, we calculate the evaluation error E(n) that is given with the current parameter e(n) as e.g. the Mean Squared Error between modeled and measured target values. The parameter e(n) can thereby be either edrive or edest. Then, we calculate the new parameter gradient e(n) and the new error gradient E(n). We reset the current parameter value in the case that the current parameter does not improve the best evaluation error (E(n) > 0):

$${e}^{(n)}\leftarrow {e}_{b}^{(n)}$$
(22)

In the case that the current parameter improves the best evaluation error until now (E(n) < 0), we update the best found values:

$${e}_{b}^{(n)}\leftarrow {e}^{(n)}\wedge {E}_{b}^{(n)}\leftarrow {E}^{(n)}$$
(23)

We use the previous values for updating our model parameter towards a decreasing evaluation error for the next iteration:

$${e}^{(n+1)}={e}^{(n)}-\eta \cdot \frac{\nabla {E}^{(n)}}{\nabla {e}^{(n)}}$$
(24)

Model selection for p min and p max

The parameters pmin and pmax determine the range of probabilities for deciding whether a car drives or parks given the time of day and zone of presence; they were initially set to arbitrary values that satisfy the conditions in (8) and (9). Once we have first sampling results, we can use pmin and pmax to normalize our traffic system according to one or more set point values. We defined the two state transition properties travel duration and travel distance which we can use both for normalization. Here, we use the share of driving time within the lifetime of a vehicle for normalization and introduce:

  • Aset: = the realistic amount of driving time

  • Adrive: = the sampled amount of driving time

If we assume that the realistic amount of driving time within the lifetime of a car is around 5%, the set point value to which we must approach Adrive is Aset = 0.05. Alternatively, one can use other values such as the average number of trips, the average fuel consumption or the average travel distance of cars as values for normalization. The only condition is that these values must be recorded as state transition properties. A computationally effective way is to normalize each dataset separately. In this case, we normalize the traffic system with respect to the time steps t = 1…T and calculate:

$${A}_{drive}=\frac{{\sum }_{v=1}^{V}\,{\sum }_{t=1}^{T}\,{T}_{v,ij,t}^{dur}}{V\cdot {c}_{time}}$$
(25)

The constant ctime considers the time resolution of our simulation. For our analysis, the original datasets entail mean travel times μij,t with one second resolution. This gives a time constant of ctime = 246060. For one minute resolution, we would respectively calculate ctime = 2460, and ctime = 24 for an hourly resolution. A more accurate but computationally intensive way is to normalize the traffic system throughout all datasets that are available for each city. In this case, we consider weekly, seasonal and inter annual variations of traffic patterns. With G being the number of datasets that we want to normalize and \({({T}_{v,ij,t}^{dur})}^{(g)}\) being the transition property of the g-th sample outcome, we calculate:

$${A}_{drive}=\frac{{\sum }_{g=1}^{G}\,{\sum }_{v=1}^{V}\,{\sum }_{t=1}^{T}\,{({T}_{v,ij,t}^{dur})}^{(g)}}{G\cdot V\cdot {c}_{time}}$$
(26)

We use a numeric iterative algorithm to iteratively calculate the parameter values that approach Adrive to Aset. In each iteration, we update either pmin or pmax by solving one of two equations. Which parameter we update, depends on the relation between Adrive and Aset. In each n-th iteration, we calculate:

$${p}_{max}^{(n+1)}={p}_{max}^{(n)}+({p}_{max}^{(n)}-{p}_{min}^{(n)})\frac{{A}_{set}^{(n)}-{A}_{drive}^{(n)}}{n\cdot {A}_{drive}^{(n)}}\quad \forall \quad {p}_{max}^{(n)}\ne 1\quad \wedge \quad {p}_{max}^{(n)}\ne {p}_{min}^{(n)}$$
(27)
$${p}_{min}^{(n+1)}={p}_{min}^{(n)}+({p}_{max}^{(n)}-{p}_{min}^{(n)})\frac{{A}_{set}^{(n)}-{A}_{drive}^{(n)}}{n\cdot {A}_{drive}^{(n)}}\quad \forall \quad {p}_{max}^{(n)}=1\quad \wedge \quad {p}_{max}^{(n)}\ne {p}_{min}^{(n)}$$
(28)

With this conditioning, the algorithm first tunes pmax and then pmin once the range of possible values is reached for pmax. The upper range of pmax here is 1 and the lower range the actual value of pmin in each iteration.

Sampled traffic activity and parking densities

The sampled traffic activity and parking density of cars in each zone can be derived from the state transition entries. If \({T}_{v,ij,t}^{dur}\) and \({T}_{v,ij,t}^{dis}\) are zero, it means that a car v is parked in zone i at time step t. By summing the number of parked cars in each time step and each zone, we can derive the spatio-temporal distribution parking and driving densities. For calculating the circadian rhythm of overall traffic activity, we sum the number of driving cars among all city zones for each time step separately. For all i = 1…N and t = 1…T, we introduce:

  • Pi,t: = the sum of sampled cars parking in zone i during time step t

  • Di,t: = the sum of sampled cars driving in zone i during time step t

  • \(T{r}_{t}=\mathop{\sum }\limits_{i=1}^{N}\,{D}_{i,t}\) := the overall traffic activity at time step t

Measured traffic activity and parking densities

The generated results can be validated for cities in which vehicle counts on street segments and parking measurements are available for the same period of time as the travel time data that we use for sampling. The city of Melbourne provides such data. Vehicle count sensors and between 5,073–9,288 underground parking sensors provide measurements of traffic activity between 2014–2017 and on-street parking bay occupancy with one second accuracy for the years 2011–201720 Each vehicle count sensor measures the number of vehicles that passes the respective street segment and distinguishes between 22 types of different vehicles. Each parking sensor measures the arrival and departure times of cars being parked in their respective bays. A separate set of files provides information on latitudinal and longitudinal coordinates of the parking bays that are equipped with sensors. We use the mutual information of these files to first allocate the sensors and then to assign each sensor to one distinct city zone. Sensors are assigned to the city zone to which they have the shortest beeline distance. The parking density of each zone in each time step is then calculated as the share of time in which all its sensed parking bays are occupied by a car. With M being the subset of city zones for which parking measurements are available we introduce for all mM and t = 1…T:

  • \({Y}_{m,t}^{Park}\): = the sum of parking bay occupancy measured in zone m during time step t

  • \({Y}_{t}^{Traf}\): = the sum of vehicles counted on all sensed street segments during time step t

Validation metric

We validate our model by calculating the Mean Squared Error between sampled and measured traffic activities and parking densities. We scale all sampled and measured values with respect to their minimum and maximum values among all time steps t = 1…T in order to be able to make sensible comparisons. For all validatable zones mM and t = 1…T, we calculate the sampled parking densities \({\widehat{P}}_{m,t}\), sampled traffic activities \(\widehat{T}{r}_{t}\), measured parking densities \({\hat{Y}}_{m,t}^{Park}\) and measured traffic activities \({\hat{Y}}_{t}^{Traf}\) as:

$${\widehat{P}}_{m,t}=\frac{{P}_{m,t}-mi{n}_{t}\{{P}_{m,t}\}}{ma{x}_{t}\{{P}_{m,t}\}-mi{n}_{t}\{{P}_{m,t}\}}$$
(29)
$$\widehat{T}{r}_{t}=\frac{{\sum }_{i=1}^{N}\,{D}_{i,t}-mi{n}_{t}\{{\sum }_{i=1}^{N}\,{D}_{i,t}\}}{ma{x}_{t}\{{\sum }_{i=1}^{N}\,{D}_{i,t}\}-mi{n}_{t}\{{\sum }_{i=1}^{N}\,{D}_{i,t}\}}$$
(30)
$${\hat{Y}}_{m,t}^{Park}=\frac{{Y}_{m,t}^{Park}-mi{n}_{t}\{{Y}_{m,t}^{Park}\}}{ma{x}_{t}\{{Y}_{m,t}^{Park}\}-mi{n}_{t}\{{Y}_{m,t}^{Park}\}}$$
(31)
$${\hat{Y}}_{t}^{Traf}=\frac{{Y}_{t}^{Traf}-mi{n}_{t}\{{Y}_{t}^{Traf}\}}{ma{x}_{t}\{{Y}_{t}^{Traf}\}-mi{n}_{t}\{{Y}_{t}^{Traf}\}}$$
(32)

The percentual fits \({F}_{m}^{Park}\) and FTraf between the measured and sampled values are then calculated as:

$${F}_{m}^{Park}=100 \% \cdot \left(1-\frac{1}{T}\mathop{\sum }\limits_{t=1}^{T}\,{({\widehat{P}}_{m,t}-{\hat{Y}}_{m,t}^{Park})}^{2}\right)$$
(33)
$${F}^{Traf}=100 \% \cdot \left(1-\frac{1}{T}\mathop{\sum }\limits_{t=1}^{T}\,{(\widehat{T}{r}_{t}-{\hat{Y}}_{t}^{Traf})}^{2}\right)$$
(34)

The average, minimum and maximum percentual fits of parking density for each sampled dataset can then be calculated as their values among all validated city zones mM, with |M| being the number of elements of set M, as:

$$\frac{1}{| M| }\sum _{m\in M}\,{F}_{m}$$
(35)
$$mi{n}_{m}\{{F}_{m}\}$$
(36)
$$ma{x}_{m}\{{F}_{m}\}$$
(37)

Usage Notes

The performance of the presented model depends on the sparsity of the travel time data. The travel time data that is collected by Uber and that we have processed in this analysis contains different degrees of sparsity for each city and each dataset (Tables 3 & 4). We recommend users to evaluate the visualizations of the generated parking densities that we provide in a Graphics Interchange Format (GIF)19. We further sample parking densities with the same parameters for all datasets. The state transition properties that result from these samples are provided in the sampling_parameters.csv for each city. Users can increase the accuracy of the individual parking density maps that we provide by performing the additional computational work that is involved with the model selection that we introduce in the Jupyter Notebook instructions18.