City-scale car traffic and parking density maps from Uber Movement travel time data

Car parking is of central importance to congestion on roads and the urban planning process of optimizing road networks, pricing parking lots and planning land use. The efficient placement, sizing and grid connection of charging stations for electric cars makes it even more important to know the spatio-temporal distribution of car parking densities on the scale of entire cities. Here, we generate car parking density maps using travel time measurements only. We formulate a Hidden Markov Model that contains non-linear functional relationships between the changing average travel times among the zones of a city and both the traffic activity and flow direction probabilities of cars. We then sample the traffic flow for 1,000 cars per city zone for each city from these probability distributions and normalize the resulting spatial parking distribution of cars in each time step. Our results cover the years 2015–2018 for 34 cities worldwide. We validate the model for Melbourne and reach about 90% accuracy for parking densities and over 93% for circadian rhythms of traffic activity.


Introduction
Car parking is of central importance to congestion on roads and their social, environmental and economic impacts on society. An imbalance between on-and off-street parking prices for instance leads to cruising for cheaper on-street parking lots which in turn is responsible for 8-74% of traffic in downtown areas 1 . If we consider that cars that burn fuels other than electricity are directly responsible for 14% of global greenhouse gas emissions 2 and 3.3 million premature annual deaths worldwide 3 , we find that there exist large potentials for environmental savings. Knowing where cars are parked at what times could support urban planning in the process of optimizing road networks, pricing parking lots and planning land use.
The electrification of the mobility sector makes it even more important to know where cars are parked at what times. At times and locations where large numbers of cars are parked with high density and must charge simultaneously for their upcoming trips, their additional electricity consumption can cause stresses to the local grid 4 . On the other side, the batteries of parked electric cars can be valuable storage capacities that can be used for balancing grid operation with large shares of intermittent renewable energy sources [5][6][7] . For an efficient placement, sizing and grid connection of charging stations, it is therefore important to know the spatio-temporal distribution of car parking densities on the scale of entire cities [8][9][10] .
The existing literature on urban traffic does not provide data on spatio-temporal parking densities on the scale of entire cities. Classic traffic system research aims at the development of optimal transport networks by minimizing congestion on roads 11,12 . The existing theories focus on the stream variables speed, flow and concentration of vehicles [13][14][15][16] ; in these, density always refers to the concentration of vehicles on roads. Car parking density maps instead would require data about the number of cars parked in each region of a city at various times of a day. We find that the main barriers for measuring such data directly are the costs and efforts of placing and operating sensors.
In this analysis, we explore if car parking density maps on the scale of entire cities can be estimated from travel time measurements among different zones of a city only. We formulate a Hidden Markov Model in which states are the locations of cars and emission measurements are the changing travel times among the zones of that city throughout a day. We apply the model to travel time data that is measured from undertaken Uber rides 17  generate the desired parking density maps for 34 cities around the globe. We further provide complete code and instructions on extending the presented model for a wider range of traffic and parking system analyses 18 .

Results
We generate car parking density maps from Uber travel time data for 34 cities worldwide 19 . We formulate non-linear functional relationships between the changing average travel times among the zones of a city and both the traffic activity and flow direction probabilities of cars. We derive origin-destination matrices for each hour of the day and for all zones of an entire town with these functional relationships. We then sample the traffic flow for 1,000 cars per city zone for each city from the resulting probability distributions and normalize the resulting spatial distribution of cars in each time step. The resulting parking density maps are hence independent from the number of sampled cars and can be scaled to arbitrary vehicle fleet sizes.
The resolution of the generated maps in time is one hour. The time periods for which we generate the results depend on the availability of the underlying Uber travel time measurements. At the time of writing this, Uber provides travel time data for the years 2015-2018 and distinguishes these by weekdays, weekends and the quarter of a year. In addition to these, Uber also provides travel time statistics that are collected regardless of the day type; these datasets cover a larger variety of trips than the separated datasets and therefore have lower sparsity.
The resolution of the generated maps in space varies and depends on how Uber divides cities into different zones. Figure 1 shows the different scales at which the generated parking maps can be used. The parking maps can be used for an analysis of the entire suburban area of a town (Fig. 1a,b) or the center of a town (Fig. 1c); the accuracy is also adequate for an analysis of parking densities within the city center of a town (Fig. 1d).
We subsequently validate the presented results in two steps for the city of Melbourne during the years 2015-2017 for both travel time data that is separated by day types and those that are not separated by day types. In a first step, we validate the circadian rhythm of traffic activity that we generate by our model. In a second step, we validate the car parking densities that result from our traffic flow model for 100-105 city zones; the city zones that we validate are given by the location of the operated underground parking sensors in Melbourne. Our choice of Melbourne for validation is arbitrary and motivated by the availability of measured traffic count and parking density data; we make this choice independent from the performance of our model.

Validation of traffic activities.
A first substantial hypothesis in our model is, that whether a car drives to another city zone or stays parked in its origin zone is a function of changing travel times from the car's current location to all possible destination zones throughout a day. This means, that the higher the measured travel times at a certain time of a day in a particular city zone are, the more likely it is that a car will undertake a trip to another destination at that given time of the day. This generates a characteristic circadian rhythm of traffic activity for each sampled day. We validate our sampled circadian rhythms with vehicle count data from street segments in the city of Melbourne that we use as an indicator for traffic activity 20 . Table 1 contains the numeric results of the validation for the years 2015 until 2017. The percentual fit between the modeled and the measured circadian rhythm is 94-99% for weekdays and 93-99% for weekends. We observe that the fit is higher for weekdays when using mean travel time data that is not separated by day types; these datasets have usually lower sparsity than those that are separated by day types. Except for the last three quarters of the year 2016, the percentual validation fit is larger for weekends if one uses the travel time data that is separated by day types than when using non-separated data. Figure 2 visualizes the results exemplarily for the first quarter of the year 2017. We can see that our sampled traffic activity is shifted towards the weekday patterns when using travel time data that is not separated by day type and therefore deviates from the actually measured traffic patterns when validated for measurements on weekends (Fig. 2b). For all other datasets (Fig. 2a,c,d), the sampled and measured traffic activities match with very high accuracy. The circadian rhythm of urban traffic has four characteristic features that are similar for all larger cities around the world 21-25 : first, a relatively high morning rush hour traffic between 8:00-10:00 am; second, a moderate lunch time traffic between 11:00 am-1:00 pm; third, a peak evening rush hour traffic between 4:00-6:00 pm; fourth, a relatively low midnight traffic between 1:00-3:00 am. We can observe that both the measured and the sampled circadian rhythms contain the four characteristic features of urban traffic activity.
Validation of parking densities. The second substantial hypothesis in our model is that the destination zone that a car will choose for a trip is a function of the traffic activity in that zone at that given time of the day. This means, that the higher the measured travel times to a particular city zone are, the more likely it is that cars will choose that zone as a destination. This generates characteristic flow directions of cars and, together with the first hypothesis, the parking density of cars among the zones of a city. We validate the characteristics of our modeled car parking densities with underground parking sensor measurements in about 100 from the total of 2,357 modeled city zones in Melbourne 20 . The validated city zones are given by the location of the operated underground sensors which is mostly around the city center of Melbourne. Table 2 contains the numeric results of the validation for the years 2015 until 2017. The percentual fit between sampled and measured parking densities is in a similar range for weekdays and weekends. We observe average fits of 82-92% for weekdays and 82-90% for weekends. The minimum fit is 64-83% for weekdays and 53-82% for weekends. The maximum fit is 94-97% for weekdays and 95-99% for weekends. The fit between the sampled and measured parking densities during weekdays is higher for the travel time data that is not separated by day type than for the separated data from 2015 until the first quarter of 2016. For the following time periods, the percentual fit is higher for the separated travel time datasets than for those that do not separate the data by day type. This turning point aligns with a larger decrease in the sparsity of the underlying travel time data for Melbourne from the first to the second quarter of 2016 (Table 3). During weekends, the percentual fit is always larger or equal for the non-separated datasets compared to the separated ones. www.nature.com/scientificdata www.nature.com/scientificdata/  (Fig. 3a,b) and the separated (Fig. 3c,d) datasets. We can observe that the patterns of parking density better match with the non-separated travel time data for both weekdays and weekends (Fig. 3a,b) than with the separated ones (Fig. 3c,d). In these, both the sampled (red) and measured (blue) parking densities rise, reach their peaks and decline at the same time. With the datasets that are separated by day types (Fig. 3c,d), our modeled (red) rise, peak and decline phases of parking density mismatch with the measured (blue) values.

Discussion
We explore, if temporal car parking density maps can be modeled on the scale of an entire city, given travel time measurements between the zones of that city only. Using the Hidden Markov traffic Model that we present and the travel time data that is published by Uber, we find that parking densities can be estimated with about 90% accuracy for both weekdays and weekends. The circadian rhythm of daily commuting traffic can further be sampled with an accuracy of over 93% with our model. Although Uber users may not take an Uber ride for their daily commute but rather for extraordinary trips, which creates a large sparsity and bias in most of the data, we find that the measured travel time data is most of the time representative enough for creating the desired car parking density maps on the scale of entire cities.
The research question we ask is new and the data that we generate is the first of its kind in the literature. In contrast to classic traffic system research, our theory estimates the density of car parking by sampling individual driving behaviour. It contains elements of the Cumulative Vehicle Count Curves (N-curves) 26 and Wardrop's second principle of equilibrium 27 . Our theory, however, differs in the fundamental hypothesis that the density of cars parked in an area is a function of the changes in travel time between the origin and destination pairs that relate to this area throughout a day.
Our results are consistent with those of previously performed traffic system analyses [21][22][23][24][25] . The parking densities and commuting trends that we model based on our central hypotheses mostly match with those that are measured in the city of Melbourne 20 . We choose Melbourne for the validation of our results based on the parking and traffic data that is publicly available.
The performance of the presented model depends on the sparsity of the origin-destination travel time matrices that we use to sample the traffic flow of cars. We can observe that the travel time data that is collected without separation by weekdays and weekends has a lower sparsity and leads to more accurate parking density trends (Fig. 3a,b) than the travel time data that is separated by weekdays and weekends (Fig. 3c,d). For the modeled circadian rhythms of traffic activity, however, the opposite holds: the traffic activity features at weekends are better modeled with the travel time data that is separated by day types (Fig. 2d) than when using the non-separated data (Fig. 2b). One major reason is that the travel time statistics for the five weekdays have greater weight in the non-separated datasets than the two weekend days. However, this does not hold for the datasets of the year 2016 which could again be caused by biases that are given through the sparsity of the datasets. City zones can further behave as ever growing sinks if more cars flow into these than out of these; this is again caused by the sparsity in the underlying data. Further research can be done on reducing the sparsity of the underlying travel time data by using e.g. satellite imagery to estimate the missing travel time matrix entries of the Uber travel time data. This could reduce the ever growing sink characteristics of zones and further biases in the traffic flow that are given by the Uber data. Figure 4 visualizes the generated parking density maps for three more cities at the times of their largest diversity. The diversity of the visualized parking maps implies the importance of our generated data for the electrification of the mobility sector: policies that are found to be effective for charging electric cars in one city can be useless in other cities due to different patterns of car parking. On the other side, our data confirms that also generally applicable policies exist: the commuting behaviour of car users creates high parking densities at commercial centers during working hours. This always includes the times of peak solar power generation during midday at which grid balancing services of electric car batteries could be most useful. www.nature.com/scientificdata www.nature.com/scientificdata/

Methods
We are given the arithmetic mean of hourly travel time measurements between different zones of a city (Fig. 5a) and want to estimate the traffic flow and spatial parking distribution of cars in that city. In a first stage, we estimate the probabilities of car traffic between zones as a function of mean travel times (Fig. 5b). These probabilities exploit the changes in mean travel time between the zones of the city throughout a day to approximate information about when cars would drive and where they would drive to. In a second stage, we sample individual car traffic from these probability distributions and determine the number of cars that are parked in a zone as a function of cars flowing in and out from that zone (Fig. 5c). In a third and last stage, we use validation results to tune the www.nature.com/scientificdata www.nature.com/scientificdata/ parameters of the probability distributions that we sample from (Fig. 5d). For each model parameter or each set of model parameters that we can freely choose, a set point value is used to make a good choice. A set point value can for instance be the evaluation error between sampled and measured parking densities and traffic activity, or the average number of trips and travel distances.
Uber travel time data. The Uber Movement project provides statistical data about travel times between different zones of a city. At the time of writing this, data is available for 34 cities worldwide with one second resolution. The data distinguishes between each quarter of several years. Depending on the city of interest, data is available from the years 2015 until 2018. For each quarter of these years, three types of datasets are available. The first type of dataset contains the aggregated measurements of travel time during weekdays and weekends. The second type of dataset contains measurements for weekdays only, and the third type of dataset contains measurements of weekends only. Detailed information on how the statistics were derived can be retrieved from Uber's official methodology paper 28 . The data includes both central and suburban regions of a city which are divided into up to 5,260 zones. The raw travel data, as it is published by Uber, consists of four entry types. We let N be the number of zones in which the city is divided and T be the number of discrete time steps in which one day is divided. The four entry types of the travel data can then be defined for all t = 1…T and i, j = 1…N as: • μ ij,t : = the mean travel time from zone i to zone j at daytime t • σ ij,t : = the standard deviation of travel time from zone i to zone j at daytime t • geo_μ ij,t : = the geometric mean travel time from zone i to zone j at daytime t • geo_σ ij,t : = the geometric standard deviation of travel time from zone i to zone j at daytime t The sparsity of data. The sparsity of the datasets plays an important role for the performance of the here presented model. These sparsities depend on the number of Uber users in the provided cities and the provided time periods. Measurements are only given between zones and at times, in which a sufficient amount of Uber rides were undertaken, so as to ensure a sufficiently good representation of overall traffic 28 . This naturally generates a bias in the data that is larger the less Uber rides were undertaken in the time period and city of interest. Tables 3 and 4 give an overview of the sparsity of the currently provided datasets. We can see that the sparsity is generally lower for the datasets that contain travel time measurements for both day types. We further see that the  www.nature.com/scientificdata www.nature.com/scientificdata/ sparsity of the datasets fluctuates but generally decreases over time as the user base of Uber grows and more rides are measured between city zones at various times of the day. We define the sparsity of a dataset as one minus the available number of data pairs divided by the maximum possible number of data pairs.  All  ----82  75  73  72  71  66  64  64  64  60  58  59   WD  ----85  79  78  77  76  72  70  70  70  66  65  65   WE  ----88  83  81  79  79  74  72  72  73  69  67  68   Bangalore  N = 198   All  ----24  19  9  9  12  12  13  12  12  10  11  12   WD  ----30  24  12  13  17  16  17  17  16  14  15  16   WE  ----39  33  19  19  24  22  24  24  24  21  22  24 Bogota N = 1,160 All ----87  84  82  74  73  71  69  67  69  68  68  69   WD  ----89  87  85  78  77  75  74  72  73  72  73  73   WE  ----92  90  88  82  81  78  76  75  76  75  75  77 Boston N = 1,247  All  ----73  71  68  66  67  65  64  63  65  63  61  62   WD  ----79  77  75  73  74  72  71  70  71  70  68  68   WE  ----80  78  76  74  74  73  72  71  73  72  70  71 Los Angeles N = 2,716 All ----89  87  85  84  84  84  83  83  82  82  82  82   WD  ----91  90  87  87  87  87  86  85  85  85  85  85   WE  ----93  93  91  90  90  90  89  89  89  89  89  89   Manchester  N = 246   All  ----82  80  78  73  72  69  66  65  66  66  65  63   WD  ----86  85  83  78  78  75  73  72  72  72  72  70   WE  ----88  86  85  80  80  78  75  74  76  75  74  73 Melbourne N = 2,357  All  ----84  82  81  80  78  78  79  77  76  79  79  78   WD  ----86  85  84  83  81  81  82  80  79  82  82  81   WE  ----90  89  88  87  86  86  87  85  85  86  87  86   Mumbai  N = 695   All  ----75  72  67  60  59  57  58  55  55  53  53  53   WD  ----78  75  70  64  63  61  62  59  59  57  58  57   WE  ----82  79  75  69  69  67  67  65  65  63  63  63   Table 3. The sparsity of the currently provided Uber travel time datasets in percentage. Cities starting with A -M. The variable N indicates the total number of city zone polygons in which the respective city is divided.
www.nature.com/scientificdata www.nature.com/scientificdata/ City zones and their distances. Each zone of a city is described by a polygon with up to several hundred vertices. A separate file provides the latitudinal and longitudinal coordinates of these vertices for each polygon of each city. We let S i be the number of vertices of the polygon of zone i, and x i,j and y i,j be the longitudinal and latitudinal coordinates of vertex j. We represent each city zone polygon by the centroid, also known as the center of gravity, of this polygon. We compute the area A i of each city zone i and the coordinates of the i-th centroid (long i |lat i ) for all i = 1…N and j = 1…S i , where j = 0 represents the same vertex as j = S i , as: www.nature.com/scientificdata www.nature.com/scientificdata/ The easiest way to calculate the distance between two points on the planet is to assume that latitude and longitude are straight lines. Using a constant distance of 111.3 km between each latitudinal degree and an average distance of 71.5 km per longitudinal degree, we can simply calculate the distances with the Pythagorean theorem. A more accurate way of doing this is to consider that longitudinal lines are not simply straight but rather bend depending on the latitudinal position: the distance between two longitudinal lines is around 111.3 km at the equator and 0 at the north and south poles. The dependence on the latitudinal position lat can then be expressed as 111.3 km ⋅ cos(lat). If we replace the average distance of 71.5 km per longitudinal degree with this expression within the Pythagorean theorem, we can calculate the distance d ij between two zones i and j at the points (long i |lat i ) and (long j |lat j ) as: Hidden markov traffic Model. We define the state of the traffic system as the distribution of cars among city zones. The state in each time step belongs to a finite set of possible states if the number of sampled cars is finite and constant. We can extend the state space of our traffic model by assigning more properties to a car than just its location. Such properties can for instance be the states of charge of electric car batteries or the number of transported persons. We describe the location of a car at a given time t = 1…T by its city zone and introduce: • X t : = the distribution of cars among all city zones at time step t If we let V be the number of cars that we want to simulate, the state space Ω X contains N V possible arrangements. Every state X t is only dependent on its previous state X t−1 which gives us a discrete-time Markov chain model. For the state transition probabilities at times t = 1…T − 1 it hence holds that: We let the state transition of our model be given by the uncertain flow of cars among all city zones and denote this as ε t . The number of possible state transitions is hence (N V ) 2 . For each time step t = 1…T-1, we write: The transition probabilities can then be described by: The problem that we want to solve can be formulated as finding the car flow probabilities ε P( ) t given the mean travel times μ ij,t only. This gives us a Hidden Markov Model as the number of cars flowing among city zones is not directly measured but derived as a function of the changing mean travel time throughout the day. The mean travel times are called emission measurements.

Probabilities of driving and parking.
For describing whether a car drives to another zone or stays parked in the same zone, we introduce a binary random variable: • A: = binary random variable that describes whether a car drives or parks We denote the values that a random variable can assume in lower cased letters and let 0 stand for parking and 1 for driving, hence a ∈ Ω a = {0, 1}. We determine the probabilities for possible values that A can take by letting the driving activity of cars in a zone be a function of the sum of mean travel times from that zone to all other zones. We assign a probability of driving according to the sum of mean travel time out of a zone in each hour compared to its minimum and maximum values throughout the entire day. For this purpose, we introduce two parameters: • p min : = the probability that a car will drive, given the minimum sum of mean travel time out of a zone • p max : = the probability that a car will drive, given the maximum sum of mean travel time out of a zone www.nature.com/scientificdata www.nature.com/scientificdata/ One can assign arbitrary initial values to p min and p max and perform a model selection based on validation results to make a good choice of these parameters. We later present an algorithm that utilizes validation results of average driving times for parameter tuning. Two conditions that p min and p max must satisfy are: www.nature.com/scientificdata www.nature.com/scientificdata/

min max
For each zone of the city, we derive a set of 24 Binomial probability distributions, that is one for each hour of the day. We define the probabilities of an event P(A = a) for zones i = 1…N and times t = 1…T as:   www.nature.com/scientificdata www.nature.com/scientificdata/ The high sparsity of our available datasets can likely create a bias if data is missing more frequently at particular times and between particular zones of a city than at others. Considering the distance between the set of zones and the number of zones in each time step for which data points are available, can significantly decrease these biases. With M ij,t being the subset of travel time data that is available at time step t, a formulation of Eq. (10) that is more robust against sparsity is given by: Probabilities of choosing a destination. For describing which destination zone a car would choose for a trip, we introduce a second random variable: • B: = discrete random variable that describes the destination zone that a car would choose We also let the popularity of destination zones be a function of changing travel times throughout a day and ask ourselves how likely it is that a car in zone i travels to zone j at a given point t in time. For this purpose, we compare the mean travel time of an origin and destination pair in each hour with its minimum and maximum values throughout the entire day. We hence assign a numeric value between 0 and 1 to each origin and destination pair and for each direction. Note that we do not use the sums of mean travel times here but rather perform a min-max scaling of the data in each time step. The closer this value is to one, the more popular a destination zone is at that given hour. We generate a valid Multinomial probability distribution whose probabilities sum up to 1 for each zone and each hour of the day by normalizing all values with a factor Z i,t . The set of values that B can assume is given as b ∈ Ω b = {1, ..., N}. We define the probability P(B = b) that a car in zone i = 1…N chooses a destination zone j = 1…N at time t = 1…T and their respective normalization factors as: 1 , Joint origin-destination probabilities. The desired transition probabilities ε P( ) t of the traffic flow can now be described by the joint events of driving and choosing a destination. For a simplified notation of these joint probabilities, we introduce a third random variable: • C: = discrete random variable that describes the joint events of driving (A = a) and choosing a destination We describe the state space of C with c ∈ Ω C = {0, ..., N}. A value of zero means that a car does not drive (A = 0) and hence stays parked in its origin zone. Any other value indicates that a car is chosen to drive (A = 1) and chooses the respective destination zone (B = 1…N) that is given by the value of C other than zero. For a simplified notation and calculation, we assume the random variables A and B to be independent and write: Given the independence of A and B, we calculate the probabilities of cars in zone i = 1…N to drive to a destination zone j = 1…N or stay parked at time t as: i t Note that the same holds for both the arithmetic and geometric mean of travel time. We summarize that the uncertain flow of cars among the zones of a city ε t is modeled as a function of mean travel times μ ij,t and described by the probability distribution of P(C = c) as: www.nature.com/scientificdata www.nature.com/scientificdata/ Initial value problem. To sample the traffic flow, we need to define a realistic initial state of the system. The problem of finding such a state is commonly referred to as the Initial Value Problem. We introduce: • X 0 : = the initial distribution of cars among all city zones We uniformly assign a location zone to each car and calculate the state transitions for an entire day. We expect the initial uniform distribution of cars to converge towards a naturally shaped distribution after all transition probabilities are applied to the traffic system. We then assume the last state of an entirely modeled traffic day as the solution to the Initial Value Problem and set: As for many other natural processes, we assume the stochastic distribution of travel duration and distance to be Gaussian 29 . We use the mean (μ ij,t ) and standard deviation (σ ij,t ) from the original datasets to describe the stochastic distributions of T v ij t dur , , ; the calculated distances between city zones (d ij ) are further used for describing T v ij t dis , , . We arbitrarily choose a standard deviation of 0.1 times the distance d ij for creating a larger variety of individual trip distances. In order to avoid negative duration and distances, we truncate the probability distributions. We choose arbitrary lower and upper ranges of 0.1 times the mean and normalize the probability density functions accordingly. We let t dur and t dis describe random variables and formulate the truncated probability distributions of  www.nature.com/scientificdata www.nature.com/scientificdata/ as exponents in the functional relationship of Eq. (18). We stretch the probability distributions for larger values than one and clinch these for smaller values than one, compared to a linear functional relationship that is given for a value of one for e drive and e dest . We let n be the n-th iteration of the optimization algorithm and introduce: • e (n) : = current parameter We test different initial parameters and set the one with the lowest evaluation error as the best found parameter e b n ( ) with the best found evaluation error e b n ( ) until now. In each iteration n, we calculate the evaluation error E (n) that is given with the current parameter e (n) as e.g. the Mean Squared Error between modeled and measured target values. The parameter e (n) can thereby be either e drive or e dest . Then, we calculate the new parameter gradient ∇e (n) and the new error gradient ∇E (n) . We reset the current parameter value in the case that the current parameter does not improve the best evaluation error (∇E (n) > 0): In the case that the current parameter improves the best evaluation error until now (∇E (n) < 0), we update the best found values: We use the previous values for updating our model parameter towards a decreasing evaluation error for the next iteration:  (8) and (9). Once we have first sampling results, we can use p min and p max to normalize our traffic system according to one or more set point values. We defined the two state transition properties travel duration and travel distance which we can use both for normalization. Here, we use the share of driving time within the lifetime of a vehicle for normalization and introduce: • A set : = the realistic amount of driving time • A drive : = the sampled amount of driving time If we assume that the realistic amount of driving time within the lifetime of a car is around 5%, the set point value to which we must approach A drive is A set = 0.05. Alternatively, one can use other values such as the average number of trips, the average fuel consumption or the average travel distance of cars as values for normalization. The only condition is that these values must be recorded as state transition properties. A computationally effective way is to normalize each dataset separately. In this case, we normalize the traffic system with respect to the time steps t = 1…T and calculate: The constant c time considers the time resolution of our simulation. For our analysis, the original datasets entail mean travel times μ ij,t with one second resolution. This gives a time constant of c time = 24⋅60⋅60. For one minute resolution, we would respectively calculate c time = 24⋅60, and c time = 24 for an hourly resolution. A more accurate but computationally intensive way is to normalize the traffic system throughout all datasets that are available for each city. In this case, we consider weekly, seasonal and inter annual variations of traffic patterns. With G being the number of datasets that we want to normalize and ( ) being the transition property of the g-th sample outcome, we calculate: We use a numeric iterative algorithm to iteratively calculate the parameter values that approach A drive to A set . In each iteration, we update either p min or p max by solving one of two equations. Which parameter we update, depends on the relation between A drive and A set . In each n-th iteration, we calculate: Sampled traffic activity and parking densities. The sampled traffic activity and parking density of cars in each zone can be derived from the state transition entries. If T v ij t dur , , and T v ij t dis , , are zero, it means that a car v is parked in zone i at time step t. By summing the number of parked cars in each time step and each zone, we can derive the spatio-temporal distribution parking and driving densities. For calculating the circadian rhythm of overall traffic activity, we sum the number of driving cars among all city zones for each time step separately. For all i = 1…N and t = 1…T, we introduce: We use the mutual information of these files to first allocate the sensors and then to assign each sensor to one distinct city zone. Sensors are assigned to the city zone to which they have the shortest beeline distance. The parking density of each zone in each time step is then calculated as the share of time in which all its sensed parking bays are occupied by a car. With M being the subset of city zones for which parking measurements are available we introduce for all m ∈ M and t = 1…T:

Usage Notes
The performance of the presented model depends on the sparsity of the travel time data. The travel time data that is collected by Uber and that we have processed in this analysis contains different degrees of sparsity for each city and each dataset (Tables 3 & 4). We recommend users to evaluate the visualizations of the generated parking densities that we provide in a Graphics Interchange Format (GIF) 19 . We further sample parking densities with the same parameters for all datasets. The state transition properties that result from these samples are provided in the sampling_parameters.csv for each city. Users can increase the accuracy of the individual parking density maps that we provide by performing the additional computational work that is involved with the model selection that we introduce in the Jupyter Notebook instructions 18 .

Data Availability
The travel time data that we use for sampling city-scale parking maps can be downloaded from the Uber Movement project website 17 . The car parking density maps and traffic activity rhythms that we derive from these datasets can be accessed on Harvard Dataverse 19 . Any data that is used for the validation of parking densities in the city of Melbourne is available on the website of the city of Melbourne 20 .
The data that we generate for each sampled city consists of four different types of files: a first set of files starts with "results_parkingdensities" and contains the share of parked cars in each city zone (rows) during each hour of the day (columns); a second set of files starts with "results_trafficactivity" and contains the circadian rhythm of overall traffic activity; a third file is called "sampling_parameters.csv" and contains the chosen and resulting parameters of the model; the fourth and last file is called "zoneID_coordinates.csv" and contains the representative latitudinal and longitudinal coordinates of each city zone polygon.

Code Availability
In addition to the description of our method here, we also provide code and instructions for reproducing the presented results and extending the developed model in Julia programming language 18 . In a first set of files we provide the contiguous programs that we use to generate the presented results and their validation; these are the files that end with ".jl". They can be used to reproduce the generated results. In a second set of files that are written as Jupyter Notebook instruction we provide step by step explanations on how our model and the validation metric work. These files can be used to customize our method for individual modeling purposes and to better understand the modeling and validation steps; they end with ".ipynb". The presented results are generated without indiviually performed model selections; they are produced with model parameters of e drive = 0.5, e dest = 2, p min = 0.1 and p max = 0.9 that are found to be good parameters for the validated cities. The results are further sampled with a total vehicle fleet size of 1,000 cars per city zone; this allows us to exploit the law of large numbers and converge towards realistic distributions of cars while keeping the calculation within the range of feasible computational time with moderate computational power.