## Abstract

The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept ‘polycentric city’ has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. Here, we leverage the fine spatio-temporal resolution of smart travel card data to infer urban polycentricity by examining how a city departs from a well-defined monocentric model. In particular, we analyse the human movements that arise as a result of sophisticated forms of urban structure by introducing a novel probabilistic approach which captures the complexity of these human movements. We focus on London (UK) and Seoul (South Korea) as our two case studies, and we specifically find evidence that London displays a higher degree of monocentricity than Seoul, suggesting that Seoul is likely to be more polycentric than London.

## Introduction

### Measuring monocentricity

People in cities interact with their environment by developing urban land for different socioeconomic activities. The way in which land use is located and arranged within a city is either a result of self-organising mechanisms over the course of time or as a result of specific interventions through different varieties of urban planning, at different spatial scales. In this context, the configuration is usually referred to as urban structure and through the study of such structures, we can learn more about the spatial behaviours of the societies that, over time, have built them. Moreover, urban structure also plays an important role in shaping the present and future, given its impact on different socioeconomic features such as mobility, access to jobs, social mixing, heterogeneity, segregation, deprivation, urban efficiency, and sustainability.

The simplest form of urban structure corresponds to the monocentric city, where socioeconomic activity is localised in a unique central region. In practice, monocentric cities facilitate the accumulation of social interactions and innovation, and consequently give rise to economies of agglomeration characterised by increasing returns to scale^{1,2}. However, monocentric cities are also subject to heavy tidal flows on the transport facilities during peak hours, severe congestion and disproportionally high rents close to the centre^{3,4,5}. The monocentric structure of cities prevailed until the industrial revolutions led to new forms of transport that broke the bounds of compact cities. Consequently, monocentric cities have gradually decentralized, transforming into complex hierarchies of different kinds of centres, neighbourhoods and sprawling structures that are tied together by a multiplicity of transport and information systems^{6}. Yet, explanatory models of urban structure based on a monocentric approach are still used due to their simplicity and formal analytical elegance. Their validity, however, should be questioned both in terms of the theoretical assumptions needed for the formulation of the models^{7,8,9} and from the point of view of public policy^{10}, since most plans for future cities have long abandoned the idea of the monocentre.

Polycentricity has therefore become the focus of much spatial policy^{11}, since it is believed that urban dwellers in polycentric cities might benefit from congestion relief in comparison with their monocentric counterparts^{12} and from increased accessibility to jobs and services, which may translate in higher rent and housing prices all across the city, but also in more time-efficient and cost-efficient travel. Despite the raise in popularity of the idea of polycentric development, it remains a rather fuzzy concept as it seems to mean different things to different actors and on different scales^{11,13,14,15,16}. The lack of a concise and coherent definition raises an issue: how to measure polycentricity? If we do not know what to measure, we simply cannot measure it^{11,15,16}. In this work, instead of attempting to answer the ill-defined question ‘to what extent is a city polycentric?’, we provide an approach to analysing departures from a well-defined concept of monocentricity.

Despite the fuzziness in the definition of ‘polycentric city’, there is a long tradition of theoretical research and empirical evidence surrounding the debate on monocentricity versus polycentricity. We will simply indicate recent work here such as that based on an analysis of data from US metropolitan areas by Arribas-Bel and Sanz-Garcia^{10}, which shows that monocentricity still retains a substantial influence on the intraurban structure of many metropolitan areas. This is despite the general consensus in the literature that modern cities above a certain size threshold become polycentric and that monocentricity is an older concept more appropriate to the city in history prior to the industrial revolution. In this sense, the concept might perhaps be somewhat obsolete when dealing with the real world. Additionally, the authors cited in^{10} find that there is no clear evolutionary trend in US cities towards polycentricity between 1990 and 2010.

By contrast, Alidadi and Dadashpoor^{17} analyse data from Iran to find that a monocentric model is not able to explain the spatial distribution of employment in Tehran while the main core has been losing its importance with the passage of time. Li^{18} draws upon fine-grained LandScan population data corresponding to 286 Chinese cities to find that in general, urban spatial structure has become more polycentric as well as more concentrated (i.e. with a higher share of their population living in the centres) while these changes have usually resulted in population and economic productivity growth.

There are other studies that find evidence for mixed types of urban structure. Hajrasouliha and Hamidi^{19} base their study on three typologies of urban structure: monocentricity, polycentricity, and generalised dispersion. When analysing the spatial structure of employment data from 356 US metropolitan regions, they find that mixed typologies of urban structure outnumber the three “pure” ones by almost four to one. They also find that polycentricity is somewhat more common than monocentricity. Similarly, in Ref.^{20}, Sweet et al use cross-sectional data to estimate the relative strengths of monocentricity, polycentricity, and dispersion for characterising Canadian cities. Their results indicate that elements of each model are evident, but each tends to dominate in different contexts. When focusing on Montreal, Toronto, and Vancouver, their results imply that accessibility, municipal competition, and globalisation play a role in shaping urban structure.

### Human mobility and urban structure

In the past, most empirical studies of urban structure based their conclusions on data associated with the spatial distribution of employment or population, obtained largely from traditional sources of direct observation by questionnaires such as surveys, censuses and administrative records. The rationale behind this choice of datasets is that they are comprehensive and representative and have the potential to uncover where city dwellers conduct most of their socioeconomic activity. It has only been in recent years that the focus has been turned to alternative data sources, which can offer real-time and easy-access records at very small spatial scales. In particular, the locations that people choose to visit at different times of the day or week are very much conditioned by the spatial structure of the city and at the same time, the complexity of human movements shapes the usage of urban space and the arrangement of resources^{6}. Therefore, the study of patterns in human mobility through alternative data sources can help us understand the travel behaviour of city dwellers and it can also help us uncover the socio-economic features of urban structure.

For example, recent studies have used data derived from social media platforms as well as location tracker devices in mobile phones in order to understand how populations distribute across the urban landscape based on the places visited by the users, both at the intra-city and inter-city scales^{21,22,23,24,25,26,27,28,29,30,31,32,33}. Taxi trajectory data is another alternative source of data that has gained in popularity in recent years as a means to uncover information about urban structure^{21,34,35,36,37,38}. Taxi trajectory data not only has the potential to reveal the characteristics of human movement within the city but also real-time traffic status as well as potential social inequalities. A third alternative source of data is that derived from smart travel cards or simply, smart cards. Like the other sources of alternative data already mentioned, smart card data offers information regarding daily human activities at high resolution, both in the spatial and temporal domains and consequently, it has been used to explore urban structure^{39,40,41,42,43,44}. Here, we will focus on the latter type of new data sources that record such movements. In our case, this is smart card data from the automatic fare collection system in London’s and Seoul’s public transport, which contains information about the origin, destination and time at which each individual journey occurs.

### Aim and contribution

Our aim is to provide a novel approach to model the extent to which a city departs from the monocentric structure by considering the variability inherent in human mobility patterns, and by avoiding the fuzziness in the concept of polycentricity. To investigate the applications of this approach, we consider two case studies using high spatiotemporal resolution data derived from smart travel cards corresponding to London, United Kingdom, and Seoul, South Korea.

Our methodology first considers the frequency distribution of the length of journeys terminating at each station in the public transport system of a given city on a typical weekday. We define the “nucleus” of each city as the station representing a hypothetical centre. We also consider the network structure of the public transport system in order to measure the length of the journeys by the network distance between stations. We then introduce Poisson mixture models as a statistical approach to describe the frequency distribution of the length of the journeys terminating at each station in the transport system. The Poisson mixture models enable us to capture the variability in the human mobility patterns reflected in urban structure, which in real cities includes a blend of features from both monocentric and polycentric cities.

Next, we state what we call the *monocentric hypothesis*: “If a city was perfectly monocentric, the expected length of the journeys taken to a given station, except for the nucleus itself, would be equal to the length of the shortest path between the nucleus and the destination station”. In this hypothetical scenario, the nucleus would be the only centre for socioeconomic activity in the city, and consequently, a typical journey terminating at a given station other than the nucleus would have its origin at the nucleus. Journeys whose destination is the nucleus would have their origin at various locations across the public transport system.

In reality, cities and urban mobility patterns are more complex than stated by the *monocentric hypothesis*, so quantifying deviations from this idealistic behaviour enables us to understand the extent to which a city departs from monocentricity, or in other words, it enables us to indirectly infer its degree of polycentricity.

Therefore, the main contribution of this analysis is a solution to the problem of quantitatively describing the degree of monocentricity of a city. Our data-driven approach based on mixture models considers the complexity of urban space since these models are able to capture the variability in human movements that arises as a result of sophisticated forms of urban structure. Instead of considering discrete typologies of urban structure (e.g. monocentric and polycentric), the method proposed here conceptualises urban structure typologies as a spectrum, where monocentricity is an idealistic extreme. While the contribution of this paper is mostly methodological, we use London and Seoul as case studies to illustrate how our method can be applied. According to the observed patterns of human mobility, we specifically find evidence that London displays a higher degree of monocentricity than Seoul, suggesting that Seoul is likely to be more polycentric than London.

The rest of the paper is organized as follows. In “Methodology”, we describe in detail the data sets used for the analysis and how the data has been processed. Section “Experiments and results” is dedicated to the methodology which we followed for the analysis. We explain how we conceive of the public transport system as a complex network. We also introduce the probabilistic modelling framework and mixture models. In “Discussion and conclusions”, we present the results of the analysis corresponding to the two case studies of London and Seoul. We provide some concluding remarks and points of discussion in Section 5. We also included “Supplementary Information” with some additional results to support our findings and conclusions.

## Methodology

### Data and notation

The Oyster card in London and the T-money card in Seoul are automatic fare collection systems that record the place and time when a traveller enters and exists the public transport system by tapping in and out with their card. In 2012, more than 80% of all journeys on public transport in London were made using Oyster card whereas in Seoul 98.9% of all journeys on public transport were made using T-money in 2013. For London, we use Oyster-card data recorded during 5 full weekdays (24 h) between January 20 and 24 in the year 2014, and for Seoul, we use T-money card data recorded during 4 full weekdays between December 17 and 21 in the year 2012. We exclude the data corresponding to Wednesday, December 19, because it was a presidential election day in Korea and regular travel patterns were disrupted.

While we hold data sets containing tap-in and tap-out records of Oyster and T-money cards, considering one or the other type of record will not affect the results of the analysis. This claim is based on the assumption that, on a daily basis, a passenger who takes a journey from one station to another, with possible stops on the way, will typically “undo” the journey by going back to the original station where they departed from at some point during the day. Even though our assumption might not always be true, this behaviour is frequently displayed by the passengers who commute daily to work, school or other regular activities, and who represent the majority of users of the public transport system on weekdays. Hence, in this analysis, we use tap-out records but we claim that analogous results would be obtained using tap-in records instead.

Based on the tap-out records, we obtain the daily count of journeys of a given length which terminate at each station on a typical weekday. For a given journey length and a given station, the daily count on a typical weekday is computed by averaging the daily count over all the weekdays included in the raw data set and rounding the average value to the closest integer. The sum of the average daily count of journeys for all the stations was 3.22 million encompassing 382 stations in London and 5.96 million for 512 stations in Seoul.

For the subsequent parts of the analysis, we introduce the following notation. Each of the *N* stations in each city is symbolised by \(S_i\), with \(i=1,...,N\). Station \(S_i\) is the destination of \(M_i\) journeys so it has \(M_i\) tap-outs. The length of the *l*th journey terminating at station \(S_i\) is symbolised by \(L_i^l\), where *l* is an index over the \(M_i\) journeys to \(S_i\) and therefore, \(l=1,...,M_i\).

### The transport system as a complex network

We define an undirected simple network \(G = (V,E)\) as an abstract conceptualisation of the public transport system. The network *G* is formed by the set of *N* vertices or nodes *V* and the set of edges *E*. The *i*th node of the network *G* corresponds to station \(S_i\) in the public transport system. An edge is present between two nodes *i* and *j* if there is at least a line of transport that provides a direct connection between the stations \(S_i\) and \(S_j\). The distance between stations \(S_i\) and \(S_j\) is symbolised by \(d_{ij}\) and is defined as the minimum number of edges that need to be traversed in order to travel from \(S_i\) to \(S_j\). The length of a journey between two stations is defined as the number of edges that are traversed from the origin to the destination nodes, but here we assume that the length of a journey between stations \(S_i\) and \(S_j\) is equal to the distance \(d_{ij}\), i.e. we assume that, from all possible trajectories from \(S_i\) to \(S_j\), passengers always choose the one involving the fewest stops. Here, we use London and Seoul as illustrative case studies, but we should point out before proceeding to the next steps that the methodology that we propose here is generalisable to other cities.

For the next steps of the analysis, we establish a hypothetical centre in the network, which we call the “nucleus”. Different notions of centrality can be considered, although not all of them are suitable for our analysis. For example, if we considered that the nucleus is the closest station to the geographical centroid of the city, then the definition of centre would depend on the physical boundaries of the city region, which in turn, can be established according to a variety of different criteria. If instead, we considered a measure of centrality based on the topology of the network, such as the betweenness centrality of each node, then the nucleus in Seoul would be Wangsimni station, which does not necessarily represent what many Seoulites would consider to be a central region of Seoul. Similarly, a measure of centrality based on traffic flows may also not coincide with what most people consider to be the centre. For these reasons, we opt for a somewhat arbitrary choice of nucleus based on what is popularly considered to be a central area: Piccadilly Circus station in London and City Hall station in Seoul. We assign the index \(i=1\) to the station corresponding to the nucleus in each city. To counter the arbitrariness of these choices of nucleus, we provide a sensitivity test of the results of our analysis. This can be found in the “Supplementary Information”, in the section titled “Sensitivity analysis for different choices of nucleus”.

Figure 1 shows the physical layout of London and Seoul’s public transport system, with the lines and stations that are included in the data sets. The transport lines are traced simply as straight lines to show the topology of the network. The colour of the nodes corresponds to the average length of the journeys terminating at \(S_i\), symbolised by \(\bar{L}_i\) and the size represents the number of journeys \(M_i\) reaching each station \(S_i\).

From Fig. 1, it is evident that the cities considered here have different spatial extents and this could be affecting the observed degree of polycentricity. One way to make the analysis more comparable between cities could be to normalise the journey length measure between stations so that both cities are on the same spatial scale. In this analysis, network distance is used to measure journey length instead of physical distance, but network distance does not always reflect the spatial extent of the cities. For instance, even if the spatial distance between an origin and a destination station is large, the network distance would remain relatively small if there are not many stations in between. Therefore, normalising the measure of network distance would not necessarily improve the comparability of the analysis for different cities. Furthermore, while the maximum spatial or network distance may be an indicator of a city’s polycentricity, it is not the only one. For example, other factors such as public transportation accessibility, land use patterns, economic development and even culture may also affect the degree of polycentricity in a city. So even if both cities were compared on the same distance scale, there would always be other factors that remain unnormalised. For these reasons, we argue that it is appropriate to keep the measure of journey length unnormalised.

### Modelling the distribution of journey lengths

In this section, we introduce a probabilistic approach to model the frequency distribution of journey lengths on a typical weekday. We regard \(L_i\) as a discrete random variable denoting the length of journeys whose destination is station \(S_i\). For each station, our data set gives \(M_i\) realisations of \(L_i\), so the observed length of the *l*th journey, symbolised by \(L_i^l\), would correspond to the *l*th realisation of \(L_i\). The true probability distribution of random variable \(L_i\) is unknown, however, its empirical probability density function, denoted by \(\hat{f}_i\), can be obtained from the observed data as

with \(h\in \mathbb {N}\). In Eq. (1), \(\mathbbm {1}_{L_i^l=h}\) is an indicator function that takes the value 1 when \(L_i^l = h\) and zero otherwise. Hence, the probability that the length of a journey with destination at station \(S_i\) is equal to *h* is approximated by \(\hat{f}_i(L_i = h)\), computed as the number of observed counts of journeys of length *h* terminating at \(S_i\), divided by the total number of journeys terminating at \(S_i\), i.e. \(M_i\).

Under the *monocentric hypothesis* stated in “Aim and contribution”, if a city was perfectly monocentric, the expected length of the journeys taken to a given station, except for the nucleus itself, would be equal to the length of the shortest path between the nucleus and the destination station. Hence, this null hypothesis can be expressed mathematically as Eq. (2)

for \(i=2,...,N\), where \(E[L_i]\) is the expected value of random variable \(L_i\), which can be approximated by the sample mean \(\hat{\mu }_i = \frac{1}{M_i}\sum _{l=1}^{M_i}\mathbbm {1}_{L_i^l}\). Taking this into account, the *monocentric hypothesis* can be expressed as \(\hat{\mu }_i = d{1i}\).

In reality, the data does not lie on the line given by \(\hat{\mu }_i = d_{1i}\), as shown in Fig. 2. In the Figure, the network distance from the nucleus to the destination station is represented on the *x*-axis and the average length of journeys arriving at a station is represented on the *y*-axis. Each bubble in the plot represents one station. The solid line is the regression line obtained via ordinary least squares. The results of the linear regression are shown in Table 1. The red dotted line represents Eq. (2), i.e. the line where points would lie if the *monocentric hypothesis* was satisfied. In fact, there is a tendency for the average length of the journeys terminating at station \(S_i\) to be less than \(d_{1i}\) as \(d_{1i}\) gets larger. This suggests that journeys which terminate at stations that are far from the nucleus, tend to take place more locally. The effect is particularly obvious in Seoul, showing that the observed patterns of mobility depart from the *monocentric hypothesis* to a greater extent.

In addition, we observe not only that \(\hat{\mu }_i = d_{1i}\) is not satisfied, but also that \(L_i\) displays a high degree of variability for \(i=1,...,N\). This effect is captured in Fig. 3, where each data point corresponds to an individual journey, the *x*-coordinate represents the distance \(d_{1i}\) between the nucleus and the destination station, and the *y*-coordinate represents the length of each individual journey \(L_i^l\), with \(l=1,...,M_i\) and \(i=1,...,N\). Once again, the red dotted line represents Eq. (2).

Figures 2 and 3 are thus a manifestation that real cities do not conform to the hypothesised monocentric scenario. Next, we explore the deviations from the *monocentric hypothesis* by leveraging the observed variability in our data.

### An approach based on mixture models

In order to describe the deviations from the hypothesised monocentric behaviour, we introduce mixture models, which are probabilistic models with the ability to represent the possible presence of different statistical sub-populations within the overall population. In the context of this paper, mixture models can be used to infer the possible presence of centres other than the nucleus based only on the data for the number and length of journeys terminating at each station. The approach that we propose here consists in assuming that the true probability distribution for the number of journeys to station \(S_i\) is given by a mixture distribution of the following form

In Eq. (3), the probability that a journey terminating at station \(S_i\) has length \(L_i = h\), is now conditional on the parameters of the true distribution, \(\textbf{w}_i\) and \(\varvec{\theta }_i\). The probability density function of the true distribution is given by a weighted sum of *K* probability density functions corresponding to each of the components of the mixture. The number of components in the mixture *K* corresponds to the number of centres assumed by the model. If \(K=1\), then the only centre accounted for in the model is the nucleus, but if \(K>1\), the model assumes that there are subcentres other than the nucleus. The weights of these components are given by the column vector \(\textbf{w}_i\), with *K* entries that satisfy \(\sum _{j=1}^Kw^j_i = 1\). Each component has an associated probability density function given by \(p^j_i(L_i=h|\varvec{\theta }_i^j)\), with parameters \(\varvec{\theta }_i^j\) and so, \(\varvec{\theta }_i\) is a matrix with *K* rows and as many columns as the number of parameters that characterise the probability density function \(p^j_i\).

For the purposes of our analysis, we set \(p^j_i\) to be a Poisson distribution with parameter \(\mu _i^j\), so that \(p^j_i(L_i=h|\varvec{\theta }_i^j)\) is now \(p^j_i(L_i=h|\mu _i^j) = \frac{1}{h!}(\mu _i^j)^h\exp (-\mu _i^j)\), for \(i=1,...,N\) and \(j=1,...,K\). This choice of distribution is motivated by the fact that the length of journeys terminating at station \(S_i\) represented by random variable \(L_i\) is in the form of count data, but also by the mathematical simplicity of the Poisson distribution, which is characterised by only one parameter.

In order to find the maximum likelihood estimates of \(\textbf{w}_i\) and \(\varvec{\theta }_i\) given the observed data for station \(S_i\), we apply the expectation-maximisation algorithm. The number of components *K* for the Poisson mixtures is one of the hyperparameters of the algorithm and needs to be determined before the learning process. In “Experiments and results”, we discuss different choices for the number of components *K*.

## Experiments and results

The case where \(K=1\) is equivalent to a simple Poisson distribution. The maximum likelihood estimator for the Poisson parameter \(\mu _i\) corresponding to station \(S_i\) is the average of the \(M_i\) observations for \(L_i\). Therefore, a visualisation of the relation between the estimated Poisson parameter corresponding to a station and the distance between the nucleus and the station is provided in Fig. 2.

### Poisson mixture model with two components

To account for the presence of centres other than the nucleus, we introduce Poisson mixture models with \(K=2\). The parameters of a Poisson mixture with two components are the weights \(\textbf{w}_i = (w^1_i, w^2_i)\) and the distribution parameters for each component \(\varvec{\theta }_i = (\mu ^1_i, \mu ^2_i)\), which are also the mean of each component. We can obtain the maximum likelihood estimators for these parameters by applying the expectation-maximisation algorithm to data corresponding to \(L_i\) and we denote them by \(\hat{w}^1_i\), \(\hat{w}^2_i\), \(\hat{\mu }^1_i\) and \(\hat{\mu }^2_i\). We refer to the component with the lowest estimated mean as the proximal component. We denote its associated weight by \(\hat{w}^p_i\) and its mean by \(\hat{\mu }^p_i\), so that \(\hat{\mu }^p_i = min( \hat{\mu }^1_i, \hat{\mu }^2_i)\). Similarly, we call the distal component the component whose estimated mean is higher and denote its associated weight by \(\hat{w}^d_i\) and its mean by \(\hat{\mu }^d_i\), so that \(\hat{\mu }^d_i = max( \hat{\mu }^1_i, \hat{\mu }^2_i)\). The 2-component Poisson mixture model enables us to classify each of the \(M_i\) individual observations of \(L_i\) as belonging to the proximal or distal components with probability given by the respective estimated weights. Therefore, journeys belonging to the proximal component are those that take place at a local scale and the journeys belonging to the distal component take place at a global, city-wide scale.

Figure 4 shows the relationship between the proximal and distal means corresponding to a given station \(S_i\) and the network distance \(d_{1i}\) between \(S_i\) and the nucleus \(S_1\). Both the proximal and distal means, \(\hat{\mu }^p_i\) and \(\hat{\mu }^d_i\), are represented in the *y*-coordinate. The size of the data points is proportional to the number of journeys terminating at each station. The results of the regression are displayed in Table 2. In the “Supplementary Information”, we discuss the relationship between \(d_{1i}\) and the weights corresponding to station \(S_i\), \(\hat{w}^d_i\), \(\hat{w}^p_i\).

As \(d_{1i}\) becomes larger, there is no significant increase in the proximal mean \(\hat{\mu }^p_i\), since it remains around 5 and never above 10. The effect is strikingly obvious in the case of Seoul. These observations are likely to be the consequence of the existence of other socioeconomic centres, closer to the destination station \(S_i\), where passengers prefer to travel to carry out some socioeconomic activities at a more local level. In contrast, the distal component displays a significant linear growth with \(d_{1i}\). The distal component captures long-distance, city-wide journeys from stations that are possibly close to the nucleus, to stations that are in the peripheral regions of the city.

### Poisson mixture model with three components

Similarly, with \(K=3\), the parameters to be estimated are \(\textbf{w}_i = (w^1_i, w^2_i, w^3_i)\) and \(\varvec{\theta }_i = (\mu ^1_i, \mu ^2_i, \mu ^3_i)\). The proximal and distal components are defined as for \(K=2\). We call the third component remaining the medial component, and we denote its weight by \(w^m_i\) and its distribution parameter by \(\mu ^m_i\). The results of the linear regression between the mean of each component and \(d_{1i}\) are gathered in Table 3. Figure 5 represents the relationship between the proximal, medial and distal means corresponding to each station \(S_i\), i.e. \(\mu ^p_i\), \(\mu ^m_i\) and \(\mu ^d_i\), and the distance between \(S_i\) and the nucleus. In the “Supplementary Information”, we discuss the relationship between \(d_{1i}\) and the weights corresponding to the three-component Poisson mixture model for station \(S_i\), i.e. \(\hat{w}^d_i\), \(\hat{w}^d_i\), \(\hat{w}^p_i\).

The behaviour of the proximal and distal components is analogous to the \(K=2\) case. However, adding a third component allows to capture the variability in the data with even more detail. Theoretically, Poisson mixture models have no limitation for the maximum number of components to be added in their formulation, however, increasing *K* indefinitely is not always sensible since it could lead to overfitting and hinder the interpretability of outcomes. For this reason, here we recommend keeping *K* to 2 or 3 as a good trade-off between capturing the detail in the data variability whilst keeping the components meaningful without overcomplicating the model.

## Discussion and conclusions

Our work constitutes a novel approach to the study of urban structure. It makes use of a probabilistic modelling framework based on Poisson mixture models and new forms of data. Simply by analysing data related to the length of the journeys taken on the public transport system, the proposed probabilistic modelling framework enables us to disaggregate the journeys into several statistical subpopulations according to their destination station and their length, measured as network distances between stations. The methodology relies on the *monocentric hypothesis*, a null hypothesis stating that in a perfectly monocentric city, there is only one centre, which we call nucleus, where all the socioeconomic activity is concentrated. Consequently, in the *monocentric hypothesis*, all the journeys terminating at any station other than the nucleus should have the nucleus as their origin station. Analysing deviations from the *monocentric hypothesis* allows us to infer the degree of polycentricity of a city by detecting the potential presence of centres other than the nucleus.

Applying the proposed general methodology to the two specific case studies of London and Seoul leads us to the following key findings. Firstly, the distribution of the length of journeys terminating at any destination station displays a high degree of variability in both London and Seoul. Secondly, by modelling the length of journeys terminating at each station in the public transport system with a single-component Poisson distribution, we observe that, especially in the case of Seoul, the most frequent journey length terminating at a given station is shorter than the distance between the nucleus and the station, perhaps due to the presence of closer, more local urban centres. Thirdly, by introducing the 2-component Poisson mixture model, we are able to classify each of the individual journeys to a station into what we call the proximal and distal components. The proximal component corresponds to journeys that take place at a local scale and the distal component involves journeys that take place at a global, city-wide scale. We see that regardless of the distance between the destination station and the nucleus, the most frequently observed journey length associated with the proximal component is around 5, as measured by network distance or number of stops away. These observations are particularly clear in the case of Seoul and are likely to be the consequence of the existence of other socioeconomic centres, closer to the destination station, where passengers may prefer to travel to carry out some socioeconomic activities at a more local level. Conversely, the most frequently observed journey length associated with the distal component is larger than the distance from the nucleus to the destination station for destination stations that are close to the nucleus, showing that passengers who terminate their journey at one of these stations may be travelling not only from the nucleus, but also from other origin stations that are further away. As the distance from the nucleus to the destination station increases, the most frequently observed journey length associated with the distal component increases fast, indicating that the distal component captures long-distance, city-wide journeys from origin stations that are possibly close to the nucleus, to stations that are in the peripheral regions of the city. Finally, increasing the number of components in the Poisson mixture model can help unpick details in the variability of the data; however, it can also make the model too complex and result in overfitting. After testing for other choices of nucleus (i.e. all the stations within a buffer distance from the initial choice of nucleus), we find that the observed patterns described above still hold.

Understanding urban structure from data related to the transportation system of the cities has significant implications for urban policy. The methodology outlined in this paper provides with a solution to the elusive problem of quantifying the degree of polycentricity for different urban areas. We argue that our proposed method resolves the issue of the fuzziness in the concept of polycentricity hence allowing for better-defined terminology, which could help policy-makers convey ideas about urban structure in a more assertive way. In particular, the recent quest for centralising activities in more compact cities, where the emphasis has been upon inner and city centre living, could be much informed by this approach where the difficulty of moving towards more compact urban structures might be measured by the different parameters associated with the Poisson mixture distributions. In this sense, these distributions are not only able to reveal how the centricity of cities might change under different travel regimes but also how travel behaviour itself might be altered.

Additionally, our approach is also of relevance to those interested in the most theoretical, and even historical, aspects of urban areas and it prompts questions for future research. As our findings for the specific cities analysed here indicate, London’s case aligns more with the scenario depicted by the *monocentric hypothesis* than Seoul’s. The construction of London’s transport network started at the end of the 19th century while the construction of Seoul’s started in 1971, approximately a century later, and in the course of all those years, several studies have reported a tendency for cities to become more and more polycentric. Assuming that the layout of the transport network and the passengers’ travelling behaviours are yet another manifestation of urban structure, then the fact that our findings suggest that London is more monocentric than Seoul, should not come as a surprise. But, does this assumption hold in general? Has London’s early construction of a public transport network conditioned its urban structure and slowed its transition towards a more polycentric arrangement like Seoul’s? There is considerable scope for extending this type of analysis to the evolution of past cities, developing, where possible, ways in which mixture models like these can reveal how spatial behaviours can and have altered over decades. Public transport data will always be an issue for such historical analysis but these methods can easily be extended to other trip distributions such as the journey to work on different modes and very different time intervals where there is data available.

This work is limited by the fact that the data sets used to illustrate the method are relatively small since they only contain data for two cities, a few weekdays and only some modes of transport. But we should reiterate here that the focus of this paper is on the proposed methodology and the analysis of data from London and Seoul is only used as a means to illustrate how to apply the method. The study is also limited in the sense that we are only able to consider the length of each journey as a network distance since the tap-out data set that was available for the analysis only shows, for each destination station, the number of journeys taken from a number of stations away. The fact that the origin station is unknown leaves us unable to consider, for example, other valuable information such as the Euclidean distance or the time duration between the start and end of each journey. Analogous limitations would apply to the tap-in data set. The main reason for these limitations is the difficulty in obtaining information about individual journeys due to data protection issues. Finally, our work only explores one temporal scale, but a deeper understanding of urban structure could be obtained by studying data at different times of the day as well as time periods of different lengths, and by analysing the temporal evolution of the data. However, as demonstrated above, the limited data is sufficient for the purpose of validating our methodology.

## Data and code availability

The data and the code used in the methodology of this paper can be found on GitHub through this link https://github.com/CrmnCA/inferring-urban-polycentricity-from-variability-in-human-mobility.

## References

Anas, A. Agglomeration and taste heterogeneity: Equilibria, stability, welfare and dynamics.

*Regional Sci. Urban Econ.***18**, 7–35. https://doi.org/10.1016/0166-0462(88)90003-8 (1988).Rosenthal, S. S. & Strange, W. C. The determinants of agglomeration.

*J. Urban Econ.***50**, 191–229. https://doi.org/10.1006/juec.2001.2230 (2001).Ahlfeldt, G. & Wendland, N. How polycentric is a monocentric city? Centers, spillovers and hysteresis.

*J. Econ. Geogr.***13**, 53–83 (2013).Anas, A., Arnott, R. & Small, K. The panexponential monocentric model.

*J. Urban Econ.***47**, 165–179 (2000).Huai, Y., Lo, H. K. & Ng, K. F. Monocentric versus polycentric urban structure: Case study in Hong Kong.

*Transport. Res. Part A Policy Practice***151**, 99–118. https://doi.org/10.1016/j.tra.2021.05.004 (2021).Zhong, C., Arisona, S. M., Huang, X., Batty, M. & Schmitt, G. Detecting the dynamics of urban structure through spatial network analysis.

*Int. J. Geogr. Inf. Sci.***28**, 2178–2199. https://doi.org/10.1080/13658816.2014.914521 (2014).Wheaton, W. C.

*Monocentric Models of Urban Land-Use: Contributions and Criticism*(John Hopkins Press, 1979).Griffith, D. A. Modelling urban population density in a multi-centered city.

*J. Urban Econ.***9**, 298–310. https://doi.org/10.1016/0094-1190(81)90029-2 (1981).Berry, B. J. L. & Kim, H.-M. Challenges to the monocentric model.

*Geogr. Anal.***25**, 1–4. https://doi.org/10.1111/j.1538-4632.1993.tb00275.x (1993).Arribas-Bel, D. & Sanz-Gracia, F. The validity of the monocentric city model in a polycentric age: US metropolitan areas in 1990, 2000 and 2010.

*Urban Geogr.***35**, 980–997. https://doi.org/10.1080/02723638.2014.940693 (2014).Green, N. Functional polycentricity: A formal definition in terms of social network analysis.

*Urban Stud.***44**, 2077–2103. https://doi.org/10.1080/00420980701518941 (2007).Brinkman, J. C. Congestion, agglomeration, and the structure of cities.

*J. Urban Econ.***94**, 13–31. https://doi.org/10.1016/j.jue.2016.05.002 (2016).Davoudi, S. EUROPEAN BRIEFING: Polycentricity in European spatial planning: From an analytical tool to a normative agenda.

*Eur. Plan. Stud.***11**, 979–999. https://doi.org/10.1080/0965431032000146169 (2003).Kloosterman, R. C. & Musterd, S. The polycentric urban region: Towards a research agenda.

*Urban Stud.***38**, 623–633. https://doi.org/10.1080/00420980120035259 (2001).Meijers, E. Measuring polycentricity and its promises.

*Eur. Plan. Stud.***16**, 1313–1323. https://doi.org/10.1080/09654310802401805 (2008).Rauhut, D. Polycentricity—One concept or many?.

*Eur. Plan. Stud.***25**, 332–348. https://doi.org/10.1080/09654313.2016.1276157 (2017).Alidadi, M. & Dadashpoor, H. Beyond monocentricity: Examining the spatial distribution of employment in Tehran metropolitan region, Iran.

*Int. J. Urban Sci.***22**, 38–58. https://doi.org/10.1080/12265934.2017.1329024 (2018).Li, Y. Towards concentration and decentralization: The evolution of urban spatial structure of Chinese cities, 2001–2016.

*Comput. Environ. Urban Syst.***80**, 101425. https://doi.org/10.1016/j.compenvurbsys.2019.101425 (2020).Hajrasouliha, A. H. & Hamidi, S. The typology of the American metropolis: Monocentricity, polycentricity, or generalized dispersion?.

*Urban Geogr.***38**, 420–444. https://doi.org/10.1080/02723638.2016.1165386 (2017).Sweet, M. N., Bullivant, B. & Kanaroglou, P. S. Are major Canadian city-regions monocentric, polycentric, or dispersed?.

*Urban Geogr.***38**, 445–471. https://doi.org/10.1080/02723638.2016.1200279 (2017).Yan, X.-Y., Zhao, C., Fan, Y., Di, Z. & Wang, W.-X. Universal predictability of mobility patterns in cities.

*J. R. Soc. Interface.***11**, 20140834. https://doi.org/10.1098/rsif.2014.0834 (2014).Tu, W.

*et al.*Coupling mobile phone and social media data: A new approach to understanding urban functions and diurnal patterns.*Int. J. Geogr. Inf. Sci.***31**, 2331–2358. https://doi.org/10.1080/13658816.2017.1356464 (2017).Yin, J., Soliman, A., Yin, D. & Wang, S. Depicting urban boundaries from a mobility network of spatial interactions: A case study of Great Britain with geo-located Twitter data.

*Int. J. Geogr. Inf. Sci.***31**, 1293–1313. https://doi.org/10.1080/13658816.2017.1282615 (2017).Yang, X.

*et al.*Understanding the spatial structure of urban commuting using mobile phone location data: A case study of Shenzhen, China.*Sustainability*.**10**, 1435 (2018).Mazzoli, M.

*et al.*Field theory for recurrent mobility.*Nat. Commun.***10**, 3895. https://doi.org/10.1038/s41467-019-11841-2 (2019).Zhu, W., Ma, D., Zhao, Z. & Guo, R. Investigating the complexity of spatial interactions between different administrative units in china using Flickr data.

*Sustainability*. https://doi.org/10.3390/su12229778 (2020).Xiao, Y., Wang, Y., Miao, S. & Niu, X. Assessing polycentric urban development in Shanghai, China, with detailed passive mobile phone data.

*Environ. Plan. B Urban Anal. City Sci.***48**, 2656–2674. https://doi.org/10.1177/2399808320982306 (2021).Liu, X.

*et al.*Analysis of urban agglomeration structure through spatial network and mobile phone data.*Trans. GIS.***25**, 1949–1969. https://doi.org/10.1111/tgis.12755 (2021).Ponce-Lopez, R. & Ferreira, J. Identifying and characterizing popular non-work destinations by clustering cellphone and point-of-interest data.

*Cities.***113**, 103158. https://doi.org/10.1016/j.cities.2021.103158 (2021).Miao, R., Wang, Y. & Li, S. Analyzing urban spatial patterns and functional zones using sina weibo poi data: A case study of Beijing.

*Sustainability*. https://doi.org/10.3390/su13020647 (2021).Song, Z.

*et al.*Building-level urban functional area identification based on multi-attribute aggregated data from cell phones—A method combining multidimensional time series with a som neural network.*ISPRS Int. J. Geo-Inform*. https://doi.org/10.3390/ijgi11020072 (2022).Wang, Y., Zhong, C., Gao, Q. & Cabrera-Arnau, C. Understanding internal migration in the UK before and during the COVID-19 pandemic using twitter data.

*Urban Inform.***1**, 15. https://doi.org/10.1007/s44212-022-00018-w (2022).Rowe, F., Calafiore, A., Arribas-Bel, D., Samardzhiev, K. & Fleischmann, M. Urban exodus? Understanding human mobility in Britain during the COVID-19 pandemic using Meta-Facebook data.

*Populat. Space Place.***29**, e2637. https://doi.org/10.1002/psp.2637 (2023).Zhou, Y., Fang, Z., Thill, J.-C., Li, Q. & Li, Y. Functionally critical locations in an urban transportation network: Identification and space-time analysis using taxi trajectories.

*Comput. Environ. Urban Syst.***52**, 34–47. https://doi.org/10.1016/j.compenvurbsys.2015.03.001 (2015).Nie, W.-P., Zhao, Z.-D., Cai, S.-M. & Zhou, T. Understanding the urban mobility community by taxi travel trajectory.

*Commun. Nonlinear Sci. Numer. Simulat.***101**, 105863. https://doi.org/10.1016/j.cnsns.2021.105863 (2021).Liu, E.-J. & Yan, X.-Y. A universal opportunity model for human mobility.

*Sci. Rep.***10**, 4657. https://doi.org/10.1038/s41598-020-61613-y (2020).Li, X., Ma, X. & Wilson, B. Beyond absolute space: An exploration of relative and relational space in Shanghai using taxi trajectory data.

*J. Transp. Geogr.***93**, 103076. https://doi.org/10.1016/j.jtrangeo.2021.103076 (2021).Choi, J., No, W., Park, M. & Kim, Y. Inferring land use from spatialtemporal taxi ride data.

*Appl. Geogr.***142**, 102688. https://doi.org/10.1016/j.apgeog.2022.102688 (2022).Roth, C., Kang, S. M., Batty, M. & Barthélemy, M. Structure of urban movements: Polycentric activity and entangled hierarchical flows.

*PLOS ONE***6**, 1–8. https://doi.org/10.1371/journal.pone.0015923 (2011).Maeda, T. N., Mori, J., Hayashi, I., Sakimoto, T. & Sakata, I. Comparative examination of network clustering methods for extracting community structures of a city from public transportation smart card data.

*IEEE Access***7**, 53377–53391. https://doi.org/10.1109/ACCESS.2019.2911567 (2019).Yang, Y., Heppenstall, A., Turner, A. & Comber, A. A spatiotemporal and graph-based analysis of dockless bike sharing patterns to understand urban flows over the last mile.

*Comput. Environ. Urban Syst.***77**, 101361. https://doi.org/10.1016/j.compenvurbsys.2019.101361 (2019).Tang, L., Zhao, Y., Tsui, K. L., He, Y. & Pan, L. A clustering refinement approach for revealing urban spatial structure from smart card data.

*Appl. Sci.*https://doi.org/10.3390/app10165606 (2020).Aslam, N. S., Zhu, D., Cheng, T., Ibrahim, M. R. & Zhang, Y. Semantic enrichment of secondary activities using smart card data and point of interests: A case study in London.

*Ann. GIS***27**, 29–41. https://doi.org/10.1080/19475683.2020.1783359 (2021).Zhang, Y., Marshall, S., Cao, M., Manley, E. & Chen, H. Discovering the evolution of urban structure using smart card data: The case of London.

*Cities.***112**, 103157. https://doi.org/10.1016/j.cities.2021.103157 (2021).

## Acknowledgements

We are very grateful to Prof HaeRan Shin from Seoul National University for kindly helping us get access to T-money data. C.Z., M.B. and R.S. are Turing Fellows at the Alan Turing Institute. The data analysis was done within UCL.

## Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 949670), and from ESRC under JPI Urban Europe/NSFC (Grant No. ES/T000287/1).

## Author information

### Authors and Affiliations

### Contributions

C.C.-A.: conceived the study, designed the methodology, carried out the data analysis, contributed towards the interpretation of results, wrote and revised the manuscript; C.Z.: conceived the study, provided the London data, contributed towards the interpretation of results; M.B.: contributed towards the interpretation of results, participated in writing and revising the manuscript; R.S.: conceived the study, designed the methodology, contributed towards the interpretation of results; S.M.K.: conceived the study, provided the Seoul data, contributed towards the interpretation of results and the writing and revision of the manuscript. All authors gave their final approval for publication.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

### Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Cabrera-Arnau, C., Zhong, C., Batty, M. *et al.* Inferring urban polycentricity from the variability in human mobility patterns.
*Sci Rep* **13**, 5751 (2023). https://doi.org/10.1038/s41598-023-33003-7

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41598-023-33003-7

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.