City-scale synthetic individual-level vehicle trip data

Trip data that records each vehicle’s trip activity on the road network describes the operation of urban traffic from the individual perspective, and it is extremely valuable for transportation research. However, restricted by data privacy, the trip data of individual-level cannot be opened for all researchers, while the need for it is very urgent. In this paper, we produce a city-scale synthetic individual-level vehicle trip dataset by generating for each individual based on the historical trip data, where the availability and trip data privacy protection are balanced. Privacy protection inevitably affects the availability of data. Therefore, we have conducted numerous experiments to demonstrate the performance and reliability of the synthetic data in different dimensions and at different granularities to help users properly judge the tasks it can perform. The result shows that the synthetic data is consistent with the real data (i.e., historical data) on the aggregated level and reasonable from the individual perspective.


Background & Summary
With the popularity of data-driven methods, data has become the foundation for urban transportation research today.Although there are some datasets [1][2][3] that represent human mobility have been opened, they have limited benefit for solving transportation issues since these data are collected by non-transportation activities and cannot be interpreted as trip behaviour directly 4 .Hence, the data directly obtained from the transportation system is necessary for transportation researches.
In the past, limited by the capability of detectors, only the traffic data at the aggregated level, like volume, could be obtained.These data characterize traffic conditions in different dimensions, based on which plenty of studies [5][6][7] have been developed to assist in traffic management.However, traffic condition is formed by trip activities of individuals, which is not contained, and only statistical values about them remain in aggregated data.Thus, such data cannot support travellers' behaviour mining 8 , personalized transportation guidance 9 , and other individual trip studies [10][11][12][13] , which are in high demand for refined traffic management nowadays.In this case, the data indicating individual trip activities is urgently needed, which we call individual-level trip data.Individual-level trip data describes the micro-operation of urban traffic system, and it contains each individual's trip information, including trip time, origin, destination, and path.Aggregated-level traffic data can be obtained from it by counting, so individual-level trip data can also support studies using aggregated data.
Individual-level trip data is now available through identity detection devices with data processing like trajectory reconstruction 14 .Although it is technically accessible, the individual-level trip data is extremely difficult to obtain and open for two reasons.First, individual-level trip data collection is expensive and restricted by local policies, which leads to only a few researchers who have cooperated with the government can get it.The second is that the real individual-level trip data involves data privacy that has been discussed by researchers 15 , making it almost impossible to share.For this reason, studies that used individual-level trip data cannot open their dataset 4,[15][16][17] .So, there are high demands for individual-level trip data, but it is hard to access.
In this paper, we will propose a city-scale synthetic individual-level trip dataset containing 1,829,218 trip records of 276,978 vehicle individuals in Xuancheng for one week.Each record of the dataset represents one trip of an individual, containing departure time at the minute level, trip origin, and destination represented by traffic zone, as well as the trip path that consists of a sequence of roads.Besides, because of our data-mining works, there is a field to indicate the traveller type of the individual, like commuter.
Unlike removing sensitive records that will cause statistical bias, we developed an individual trip generation method that balances data availability with the protection of individual trip privacy.With the historical trips input, it can generate trips for each individual and finally form this synthetic dataset.In terms of aggregate metrics like trip time distribution and the frequency of different origin-destinations of all individuals, the synthetic data is highly consistent with the real data.However, the synthetic dataset is not precisely aligned with the real data for trip privacy protection.Still, the generated trips of each individual are reasonable and can support research and analysis on the individual level.
The synthetic individual-level vehicle trip dataset has a wide range of use for research, such as studies focusing on travellers' trip behaviour pattern 18 , trip prediction 17,19,20 , travel time estimation 10,21 , origin-destination pattern estimation 16 and analysis of the effect of transportation policies 22 .Also, this dataset can support studies using aggregated data like traffic volume 23,24 .Besides, since the road network data that matches this trip dataset is opened by this paper, simulation-based transportation research 11,[25][26][27] can also be supported by this dataset.

Original data sources
The mobility of vehicles on urban road networks can be captured by Automatic Vehicle Identification (AVI) 28 device.With technologies like trajectory reconstruction 14 , the data directly recorded by AVI device can be processed as individual-level trip data, which is more user-friendly and valuable.Specifically, each record of it covers information about one trip taken by an individual, including departure time, origin, destination, and trip path.Fig. 1 gives a simple example of a road network in a regular square grid, and some elements are shown on it.
In this paper, the original trip data was collected in XuanCheng city of China for one month (2019/8/01-2019/9/02).It is a city-scale dataset containing 823,177 vehicles and 9,002,572 trips in total.In addition to the trip information mentioned above, this original trip dataset has two characteristics.First is that the trip origin and destination are represented by the traffic zone that is enclosed by roads (see Fig. 1).In this case, the trip data is more reasonable and gets higher availability.Besides, some data mining works have been done on the dataset, by which each vehicle individual was given a traveller type like commuter.The proportion of travellers and trips of different traveller types are shown in Fig. 2. In addition, Fig. 3-6 show the trip information of travellers with different traveller types, of which the road access frequency refers to the proportion of trips through the roads (both directions will be counted).

Trip generation
As shown in Fig. 7, the individual's trips are generated one by one, and a new individual will be switched to when the former has completed generation.The generation for one trip can be decomposed into four steps.Before introducing each step, some definitions and notions need to be stated first.
Note the number of individuals in the original dataset as n, who are the targets for generation.Let v i be the i-th individual, then the set of individuals can be represented by V = {v 1 , v 2 , ..., v n }.Some numeric variables about individuals are described in Table .1.All of them are counted based on the range of the original dataset.Use Z to represent the set of traffic zones in the city, and note z x ∈ Z as the traffic zone numbered x.Let d u be a specific day, and d u+1 represents the day after d u .Note the time of a day as {t 1 ,t 2 , ...,t 1440 }, of which t k represents a time period with an interval of one minute.For instance, t 1 indicates the time period "00 : 00 − 00 : 01".Further, denote l m,n = {t m ,t m+1 , ...,t n } as a time slot and the set of l m,n (i.e., {l 1,a , ..., l b,1440 }) as L. The time periods aggregated by a time slot are continuous, and each t k only belongs to one l m,n (∃!l m,n : t k ∈ l m,n , ∀t k ).T p is a variable maintained to indicate the current time of the generation, and it will be updated as the generation proceeds.The format of T p is d u &t k , representing the time period (t k ) of the day (d u ).Note the end time of generation as T e .It signs the generation of the individual is complete when T p > T e and it's time to switch to a new one.

Initialization
Initialization is only executed when the switch is made during generation.Specifically, it can be classified into two cases.First, when switching to a new individual, an initial location (i.e., traffic zone) should be given as the origin of the first generated trip (after that, the destination of the last trip would be set as the origin of the next trip).Besides, the current time should be initialled as the start time of the generation.Next, when a new day is switched to, of course, it includes the first day of a new individual, the trip frequency of the individual for the day needs to be determined by initialization.
Note the initial location of v i as z p i .We set z p i as the traffic zone that is most frequently visited by v i , see Eq. 1.

2/24
Let v f i be the average of trip frequency per day of v i , and it can be calculated by Eq. 2, where D is the number of days for v f i counting.v f i can be represented as v f i = v f i + {v f i }.On this basis, the number of trips needed to be generated of the day can be calculated by Eq. 3, where ξ ∼ B(1, Note the number of trips that have been generated for v i of the day as v dh i .Then v d i − v dh i indicates the number of trips that remain to be generated.When v d i − v dh i = 0, a new day would be switched to, as well as recalculated v d i .It is worth noting that, in this way, the single-day trip frequency of generated individuals will be more evenly distributed.

Trip time generation
The trip time mentioned in the section refers to the departure time of trips.Its granularity is at the minute level, which is equal to the original data.There are two steps to determine the trip time: 1) time slot determination; 2) time period determination.
Time slot determination This step determines the departure time slot of the trip being generated.An individual's trips should be spatially continuous, i.e., the destination of the previous trip should be the origin of the next trip after the individual's trips are ordered chronologically.To satisfy this objective fact of trips as much as possible, we generate trips of each traveller by time with T p recording the current time.In other words, the trip time of the last generated trip would be later than the former one.In this case, the present location of individuals is explicit, which benefits to guarantee spatial continuity.Besides, it is consistent with the law of individuals travelling in the real world.To achieve this, we introduce the logic factor c s of time slots.Suppose T p = d r &t d and a trip in d r is being generated.Define the subsequent time slot set of ).On this basis, ∀l m,n ∈ L, the logic factors can be calculated by Eq. 4. κ is a very small value, and its constraints will be described later.It can be proved that A = φ .
The aggregated temporal distribution of trips portrays the urban traffic operation in the time dimension, which is valuable for transportation research 6,27,29 .To restore the distribution of the real data and guarantee data availability at the aggregated level, we first ensure the consistency of the proportion of trips among time periods.To achieve this, we introduce the aggregation factor c r of time slots.
Note the number of trips taken within l m,n by v i as v l i,m,n , i.e., v l i,m,n = ∑ n k=m (v t i,k ).Denote u(x) as an aggregate function for individuals, and Similarly, we denote u g (l m,n ) for counting the generated data to measure the difference between generated and real data.Then the aggregation factor can be calculated by Eq. 5, where f (x) is a continuous function that satisfies the constraints shown in Eq. 6.
As shown in Fig. 4, we have divided the travellers into five types based on our previous work.Their different distributions indicate they have different proportions of trips in time slots, such as the trips of commuters (during weekdays) are mainly concentrated in the morning and evening rush hours, which is important information for research.Therefore, to preserve the 3/24 information for each type of traveller, we propose that c r only share among the same type of travellers when performing trip generation.
The larger the time granularity considered for the trip time, the weaker the uniqueness of individual temporal features.For instance, there may be only one individual trip at time period t q , t w and t e in a day, but it will be hidden among many individuals considering the time slots these time periods belong to.Hence, for time slot choices of individuals, we make it consistent with one's choice in real data to get better usability on the individual level.Then two individual preference factors are designed: 1) whole preference factor; 2) location preference factor.
The whole preference factor c p of the time slot is defined by Eq. 7. Let v l,o i,m,n,a = ∑ n k=m (v t,o i,k,a ), which denotes the trip frequency of v i with z a as the origin during l m,n .Suppose z p i = z c , then location preference factor c op can be calculated by Eq. 7.
After defining c s (l m,n ), c r (l m,n ), c p (l m,n ) and c op (l m,n ), we give the formula for factor integration, see Eq. 8. Then the probability of each time slot to be chosen is given by Eq. 9.
Eq. 8 can be seen as three terms, and they consider trip time logic, aggregation information, and individual preference, respectively.ε > 0 is a small value, and it ensures c p (l m,n ) * (1 + c op (l m,n )) + ε > 0, making a time slot will not be excluded only by individual preference.On this basis, it can be proved that ∑ l(m,n)∈L c(l m,n ) > 0. So whatever the case, Eq. 9 can pick a time slot with strong robustness.Conflicts may occur between different factors.For example, suppose a time slot l i,o , its proportion in generation data is much lower than the real, i.e., u g (l i,o )/ ∑ l i,o ∈L u g (l i,o ) − u(l i,o )/ ∑ l i,o ∈L u(l i,o ) → −µ, but the individual never tripped on l i,o (c p (l i,o ) = 0).This lets c r give a high value for balancing at aggregation, while c p (l m,n ) * (1 + c op (l m,n )) + ε = ε that a small value since l i,o is not preferred by the individual.To handle these conflicts, we determine the priority of these factors by following constraints.
i makes ε hardly influence the individual preference factor.Also, κ lim x→−µ f (x) ≈ 1 defines that the priority of logic factor is higher than aggregation factor.To summarize, the time logic of trips is the first thing to ensure, followed by aggregate information.On this basis, the preferred trip time slots of individuals will be followed.
Time period determination This step is to determine the time period t s based on the selected time slot l s .In this dataset, for privacy reasons, we cannot fully expose an individual's real trip time.However, it is achievable that make the trip time period distribution of generated data approximate to the real data.
Assuming l s = l a,b and the current time period of T p is t d , then the range of trip time period that can be selected from is l r,b , where r = max(a, d).Denote u(t k ) = ∑ n i=1 (v t i,k ), and e(t k ) = u(t k )/ ∑ 1440 j=1 (t j ).using e g to indicate the statistics of the data have been generated like u g .Define ∆e(t k ) = e(t k ) − e g (t k ), and the probability of t k ∈ l r,b to be selected can be calculated by Eq. 11.
With the same consideration of letting c r be shared only among the same type of travellers, we propose to distinguish the traveller types when computing ∆e(t k ) to restore aggregated temporal distribution for each type of traveller.

Trip destination generation
In the real world, trip origin and trip time are two significant elements related to the trip destination choice of individuals.Individual trip features are mainly reflected by these spatiotemporal and spatial associations of trips, which means that these information would very easily reveal the trip privacy of individuals.For privacy protection reasons, in our method, the information about the spatiotemporal association of individual trips is protected.In other words, only the trip origin (current location) is considered when determining the destination.Note v o,d i,a,b as the trip frequency of v i with z a and z b as the trip origin and destination, respectively.Define v o i,a = ∑ m v o,d i,a,m , which represents the trip frequency of v i with z a as the trip origin.Supposing the current location is z c (the origin of this trip), then the destination z s can be determined with the probability given by Eq. 12.

Trip path and duration generation
Trip path refers to a sequence of spatially continuous roads (see Fig. 1) by which the individual trips from the origin to the destination.There are usually multiple access paths between two traffic zones, and individuals' selections of trip paths affect road flow distribution, which is significant information for traffic condition analysis.However, according to this study 15 , just a few spatiotemporal tuples can identify most individuals uniquely.Even though we are generating all trips, privacy leakage is still possible if we completely restore the individual path choices.Thus for an individual, we sample the trip path based on its crowd (e.g., random traveller), which is a way for generalization.It can recover the flow distribution of roads and conceal individual trip preferences.Specifically, note z o ,z d as the origin and destination of the trip being generated, and note the set of trip paths that connect z o and z d as P o,d .The probability to be chosen of each trip path in P o,d can be given by Eq. 13.
Trip duration (or travel time) means the time taken to complete the trip.It is mainly related to the length of the trip path, while it is significantly affected by traffic control strategies like signal control and actual traffic condition.Since the generated trip data includes departure time, origin, destination, and trip path, the trip duration can be estimated or obtained by simulation.However, considering some users just need a possible trip duration for analysis, we give each trip's duration retrieved from the real data.Considering the correlation between trip duration and trip time slots, in determining the trip duration of a generated trip, we randomly sample among historical trips of the same trip path and trip time slot.In this way, each trip's duration was tripped by an individual in the real world with that traffic condition.

Data Records
The city-scale synthetic individual-level vehicle trip data is released by comma-separated values (CSV) files, containing 1,829,218 trip records of 276,978 vehicle individuals in XuanCheng for one week.The fields of data record and the meanings are shown in Tab. 2. To support more applications, the road network of XuanCheng city, which matches this synthetic trip dataset, is also given and released in a Zip file.Besides, the relationship between the traffic zones and roads is released by a CSV file (see Tab. 3 for detail).These data are available at the Figshare 30 repository.

Technical Validation
Although the trip data proposed in this paper is synthetic and a lot of effort has been made to protect individual trip privacy during generation, the dataset still has a high value for research and application.In this section, we will validate our data by comparing it with the real trip data (2019/8/12-2019/8-18) from both aggregated and individual perspectives.

Aggregated level
Aggregated level data (e.g., Fig. 4-6) refers to the data formed by individuals' trips aggregated from spatial or temporal dimensions, such as the distribution of trips with time.It can be obtained from the individual-level trip data by simply counting, indicating the aggregated information of trips within the selected range.The generated data can support the aggregated level research or analysis when it keeps consistent with the real one on this level.Next, a series of comparisons of generated versus real data will be demonstrated.For quantitative evaluation, the Jensen-Shannon divergence (Eq.14-15) is introduced, where p,q 5/24 are the distributions based on historical data and generated data statistic, respectively.Besides, for two set S, S that satisfy |S| as the overlap ratio of these two sets, which will be used in spatial dimension evaluation.
Temporal dimension The distributions of trips with time (distinguished weekday and holiday) of the synthetic and real data are shown in Fig. 8, and the Jensen-Shannon divergences of the two distributions are shown in Table 4.The vertical axis of Fig. 8 adopts the frequency of trips to show that the quantity of generated trips is also similar to the real one.Moreover, the result of high consistency would be kept when considering a smaller time scale, like each day or a specific time slot.

Spatial dimension
The spatial information of urban trips can be portrayed from three levels: (1) the visited hotness of traffic zones; (2) the dominated origin-destination of trips; (3) the distribution of road access frequency.
The visited hotness of traffic zones can reveal the main activity areas of travellers in the city.We have counted the top-k most frequently visited traffic zones of the synthetic and real dataset with different k values.The overlap ratio is calculated and formed Tab. 5.The hotness of traffic zones may vary over time, so we conducted further experiments, such as limiting the time to specific periods.The results show that the performance shown in Tab. 5 is stable.Further, Fig. 9-10 show the distributions of traffic zones with different hotness levels on the road network.In addition to using colour to distinguish the hotness levels, a larger dot indicates a higher visited frequency, i.e., the largest red dot is the most visited traffic zone.
From the perspective of origin-destination of trips, we count the trip frequency of different origin-destination of the synthetic and real data.The overlap ratios of top-k dominated origin-destinations are shown in Tab. 6.The results can be maintained when distinguishing weekdays and holidays.
The distribution of access frequency of roads can help identify critical roads and is valuable for urban transportation management.The access frequency proportion of major roads of the synthetic and real data are shown in Fig. 11-12.In addition, the Jensen-Shannon divergence of the two distributions is calculated and shown in Fig. 7.When limiting the time range to specific slots, such as the morning and evening rush hours, the Jensen-Shannon divergence will be double but still at a pretty low level.Further, the distributions of daily access frequency of roads on the road network are shown in Fig. 13-14.
In summary, the generated data can restore the aggregated level information of the real data.Besides, we found the bias between the generated and real data is close to that of two weeks of the real data.This means that the generated data will not be distinguished by aggregated information.

Individual level
The availability of the generated trip data on the individual level is based on the reasonableness of trips from the individual perspective.Specifically, it includes two levels of information: 1) the reasonableness of a single individual's trips; 2) distributions of individual-based statistics.Next, we will validate the generated data from these two levels.

Reasonableness of single individual trips
In the real world, individual travel follows certain laws.For instance, Individuals' trips are spatially continuous, and there is generally an interval of time between trips.Therefore, the trip data of an individual that follows these laws are considered reasonable.Next, we will explain the reasonableness of the generated trip data in the following aspects.
• Trip frequency.The trip frequency of an individual is generally in a reasonable range.In the generation, the daily trip frequency is determined with an individual in the real world as the template.In this case, the generated individual's trip frequency each day, as well as the cumulative frequency of trips in a week, is in a reasonable range.
• Trip time interval.There is a certain time interval between two consecutive trips of an individual in the real world.Thanks to the introduction of trip time logic and preference factors in trip time determination, there are almost no two trips with very short intervals (e.g., a few seconds) in the generated data, which are considered abnormal and should be merged.
• Trip spatial range.When determining the destinations for generating trips, we take the destinations visited by a real individual as the candidate set.Thus the trip spatial range of the generated individual will not exceed that of its template individual in reality, which makes the trip spatial range of all individuals in the synthetic dataset reasonable.

6/24
• Spatial continuity of trips.Objectively, the trips of the individuals should all be of spatial continuity (the destination of the previous trip is the same as the origin of the next trip).However, it cannot be completely ensured due to incorrect license plate recognition and driving out of the perception boundary.In the synthetic data, the trip spatial continuity ratios of commuters, stable travellers, and random travellers are 81.68%,79.55%, and 76.96%.Compared to the real data, they improved by 10.86%, 12.55%, and 11.82%, respectively.

Distributions of individual-based statistics
Although the trips are reasonable from an individual perspective, the availability of the generated data on the individual level would be decreased if the distribution of individual trip characteristics does not match a real city.In this section, we focus on the following three distributions of individual-based statistics.
• Distribution of trip frequency.The daily trip frequency of individuals is calculated based on the historical trips of real individuals.Thus, the distribution of individual trip frequency of the generated data is highly consistent with the real data.
• Distribution of the number of trip time slots.The number of time slots covered by individuals in the generated data and the real data of one week are counted and displayed in Fig. 15.
• Distribution of entropy of trip destinations.Entropy is a measure of the regularity of travellers 31 .The distributions of the generated and real data are shown in Fig. 16-17 (the trip frequency of passerby travellers is too low to analyze entropy).The colour blocks represent the proportion of individuals among all individuals with the same number of trips.
It should be noted that the dynamic range of colours is set to 0 − 0.12, and the scale higher than 0.12 is also marked with the same colour as 0.12 (i.e., red), for better display of details.Since the destination candidate set for generation considers the individual's destination choice for one month while comparing the real data with one week, the entropy of individuals in the generated data is slightly higher than that of the real data.This reflects our data privacy protection and has little impact on data availability.

Usage Notes
All datasets open in this paper are in file form, and users can access them in their entirety without any further permission.To make the individual-level trip dataset publicly available, we perform works on traveller's trip privacy protection that we have mentioned in the Methods section, which inevitably affects the usability of the data.Here, to prevent the data's misuse, we would like to remind potential users of the characteristics of the synthetic (or generated) trip dataset and the tasks for which it is unsuitable.First, The average daily trip frequency for each individual in the generated trip dataset is reasonable, and the overall distribution is realistic.However, for a single individual, the distribution of individual trip frequencies in the synthetic dataset is more uniform.Therefore, we do not recommend using this synthetic dataset to analyse the variation and patterns of individual single-day trip frequency.Second, in the synthetic dataset, traveller trip temporal and spatial preferences are reliable, but trip spatio-temporal associations are broken.Hence, this dataset is not applicable for tasks involving the analysis of trip spatio-temporal associations of travellers.
In addition to reminding synthetic data of the limitations of its use, we would also like to make a few notes to facilitate better data usage.First, the traveller labels given in the dataset are not true values and can be reassigned or divided by the user according to the specific task.Second, the individuals whose "traveller_ID" starts with "Wan_P" can be considered as local vehicles (or travellers).Thus this synthetic dataset can support the trip pattern analysis for local and foreign vehicles.Third, we may have prior knowledge or common sense about travelling in the city, such as commuters usually travel to their workplace in the morning.Most travellers may behave in line with common sense, but we also need to be aware of the presence of unusual travellers.For example, in our previous analysis, we found that there were night commuters in the city who travelled to their homes in the morning.12/24

Figures & Tables
Notation Description The trip frequency of v i with t k as departure time.
The trip frequency of v i with z a as the origin.
The trip frequency of v i with z b as the destination.
The number of trips of v i with trip path p k .
Table 1.Description of some numeric variables.The path that the trip take (roads are separated by "-") Duration

Time slot determination
The length of time for completing the trip         19/24

.
The time slot that t d belongs to is contained in L s , enabling individuals to make multiple trips in the same time slot.Denote set L e ⊆ L s , and L e satisfies |L e | = min(v d i − v dh i − 1, |L s | − 1).If |L e | = 0 (i.e., |L e | = φ ), |L e | further follows the constraint: ∀l m,n ∈ L e , l p,q ∈ L s − L e : m > p.Under the above constraints, L e contains the latest time slots of the day.Note set A = L − L c s − L e (i.e., L s − L e ：Node

Figure 1 .
Figure 1.Schematic illustration of elements in road networks.

Figure 2 .
Figure 2. Proportion of travellers and trips of different traveller types.

Figure 4 .
Figure 4. Distribution of trips with time of different traveller types (Statistics and plots at 15-minute granularity).Figure 4. Distribution of trips with time of different traveller types (Statistics and plots at 15-minute granularity).

Figure 4 .Figure 5 .Figure 6 . 23 Figure 5 .Figure 6 .
Figure 4. Distribution of trips with time of different traveller types (Statistics and plots at 15-minute granularity).Figure 4. Distribution of trips with time of different traveller types (Statistics and plots at 15-minute granularity).

Figure 7 .
Figure 7.The framework of the trip generation.

Figure 8 .
Figure 8. Distributions of trips with time of the synthetic data and the real data (Statistics and plots at 15-minute granularity).

Figure 9 .
Figure 9. Distribution of visited hotness of traffic zones on the road network (Commuter and other two types of traveller).Figure 9. Distribution of visited hotness of traffic zones on the road network (Commuter and other two types of traveller).

Figure 9 .Figure 10 .
Figure 9. Distribution of visited hotness of traffic zones on the road network (Commuter and other two types of traveller).Figure 9. Distribution of visited hotness of traffic zones on the road network (Commuter and other two types of traveller).

Figure 10 .Figure 11 .
Figure 10.Distribution of visited hotness of traffic zones on the road network (High-freq and passby travellers).

Figure 11 .Figure 12 . 23 Figure 12 .
Figure 11.The frequency of major roads of the synthetic and real data (Commuter and other two types of traveller).

Table 2 .
The synthetic (or generated) individual-level trip data attributes.

Table 4 .
Jensen-Shannon divergences of trip frequency with time (generated data vs. real data).

Table 3 .
Attributes of data about the relationship of traffic zones and roads.

Table 5 .
Degree of overlap of hot traffic zones.Distributions of trip time of the synthetic data and the real data (Statistics and plots at 15-minute granularity).

Table 5 .
Degree of overlap of hot traffic zones.

Table 6 .
Degree of overlap of main OD combinations.

Table 7 .
Jensen-Shannon divergences of the access frequency of main roads (generated data vs. real data).

Table 6 .
Degree of overlap of main OD combinations.