Defining a city — delineating urban areas using cell-phone data

Dong, Lei; Duarte, Fabio; Duranton, Gilles; Santi, Paolo; Barthelemy, Marc; Batty, Michael; Bettencourt, Luís; Goodchild, Michael; Hack, Gary; Liu, Yu; Pumain, Denise; Shi, Wenzhong; Verbavatz, Vincent; West, Geoffrey B.; Yeh, Anthony G. O.; Ratti, Carlo

doi:10.1038/s44284-023-00019-z

Perspective
Published: 11 January 2024

Defining a city — delineating urban areas using cell-phone data

Nature Cities volume 1, pages 117–125 (2024)Cite this article

5571 Accesses
57 Altmetric
Metrics details

Subjects

Abstract

What is a city? Researchers use different criteria and datasets to define it—from population density to traffic flows. We argue there is one dataset that could serve as a proxy of the temporal and spatial connections that make cities what they are: geolocated data from the world’s more than 7 billion cell-phone users. Cell-phone data are a proxy of people’s presence in a given area and of their movement between areas. Combined with computational methods, these data can support city delineations that are dynamic, responding to multiple statistical and administrative requirements, and tailored to different research needs, thus accelerating ongoing work in urban science.

You have full access to this article via your institution.

Download PDF

A Global Feature-Rich Network Dataset of Cities and Dashboard for Comprehensive Urban Analyses

Article Open access 30 September 2023

Hierarchical organization of urban mobility and its connection with city livability

Article Open access 23 October 2019

Urbanity: automated modelling and analysis of multidimensional networks in cities

Article Open access 25 July 2023

Main

The first cities emerged more than 5,000 years ago in ancient Sumeria, where the agricultural surpluses led to the creation of marketplaces that formed the seeds of an embryonic urban economy based on exchange¹. Cities grew up around these marketplaces, where populations serving their agricultural hinterlands were packed in close proximity so that the costs of distribution could be minimized, and fortifications were built to keep the city intact from invaders intent on capturing the market and its resources. Until the Industrial Revolution, most cities remained small, with the biggest barely exceeding a million people. It took the invention of mechanical transportation technologies for cities to begin to exceed this limit for the first time. By the end of the nineteenth century, London had grown to over 6 million people and New York to over 4 million people². As a key factor, transportation and technological advances have further broken the historical boundaries of cities. Populations began to spread out into their hinterlands, blurring the line between urban and rural. By the middle of the last century, some large cities had begun to fuse with one another, creating megacities such as on the United States’ eastern seaboard³, while more recent growth has led to urban complexes such as China’s Greater Bay Area, which now exceeds 86 million people.

From a qualitative perspective, many scholars have addressed the question of ‘What is a city?’^4,5; and from a quantitative perspective, in the 1960s, Berry developed the delineation of metropolitan areas using commuting flows^6,7, and the current boundaries of metropolitan areas in the United States and some other regions and countries are still based on this methodology^8,9. More recently, the United Nations Human Settlements Programme (UN-Habitat) published a report under the premise that, to achieve the Sustainable Development Goals, we need to propose solutions to key urban challenges, and to do so, we need a clear and globally agreed definition of what a city is¹⁰. This report discusses two definitions: one defines the city by its urban extent and uses satellite imagery to identify the morphology of human settlements and the density of built-up structures; another defines the city by its degree of urbanization¹¹, based on population (density) using grid cells, a methods that is also endorsed by the United Nations Statistical Commission¹². Although these definitions are laudable, two challenges still remain: one definition is based on physical features and the other on granular population data, and neither consider flows—which we will discuss later. UN-Habitat also pointed out that to operationalize any local or global policy, we need timely data and metrics.

Unprecedented geospatial datasets, increased processing power and recent advances in computational methods are now providing opportunities to expand our understanding of cities and to make the delineation of cities more accurate and timely. A new science is rapidly emerging. Starting from such datasets and methods, it aims to quantitatively describe cities and systems of cities^{13,14,15,16,17,18,19,20}. Despite the ‘discovery’ of some relatively robust urban laws (for example, gravity, allometric scaling and rank-size), we are still at the very beginning of elaborating this new science of cities, and what is still missing is a widely accepted and thoroughly discussed methodology to delineate a city.

This Perspective discusses how scholars and policy officials have been making efforts to reach theoretically sound and operational definitions of the city; the challenges that these efforts are still facing; and how the use of global, standardized datasets such as cell-phone records, coupled with rigorous guidelines and methods for delineating cities informed by such datasets, can help make substantial steps towards widely agreed city delineations for urban science. A key argument of this paper is that there is an essential feature that could unify the most well-accepted definitions of cities: the presence and flows of people. In addition, we propose cell-phone data as a possible dataset representing a universal proxy for the presence and flows of people that define cities. After all, as Shakespeare said: “What is the city but the people?” (Coriolanus, Act 3, Scene 1).

Existing definitions

Before moving forward, we must pose an important clarification of terminologies: ‘definition’ refers to the meaning, or how we explain something; whereas ‘delineation’ refers to how we establish the boundaries of something—in our case, the definition is the set of rules one applies and the delineation is the city boundaries resulting from the implementation of these rules.

Definitions based on administrative boundaries

The administrative divisions traditionally used to delineate cities are based on very different parameters. For example, Brazil has 5,570 municipalities, 68% of which have fewer than 20,000 inhabitants, and Barcelos, slightly above this threshold, has an area larger than North Korea or New York State; however, there are only about 300 prefecture-level cities in China, with an average population of nearly 4 million people²¹. These vastly different administrative delineations of cities in different regions pose a major challenge for comparative studies on a global scale.

To tackle this issue, national or transnational statistical offices have long used quantitative definitions of metropolitan areas. One common definition of a metropolitan area consists of core urban areas and commuting zones surrounding the center. From a labor market perspective, this dependency is usually represented by a combination of population and commuting trips. For example, core-based statistical areas (CBSAs) in the United States are defined by starting from a county containing a city of 10,000 to 50,000 (micropolitan areas) or at least 50,000 (metropolitan areas) inhabitants, which grows to encompass surrounding counties based on criteria that combine population density and a minimum threshold of commuting flow to the main city⁸. Using this definition, over 900 CBSAs are identified in the United States, with New York–Newark–Jersey City being the largest, with over 20 million inhabitants.

In China, a quantitative definition of metropolitan areas was not available until very recently²². According to China’s National Development and Reform Commission, metropolitan areas are centered on megacities (10 million people or more) or large cities (5 million) and are bounded by a 1-hour commuting buffer. With this high-population-size criterion, China has far fewer metropolitan areas than the United States or the European Union (EU).

In an effort to facilitate international comparison, metropolitan areas in the EU are defined using the notion of functional urban areas (FUAs)^23,24. According to this definition, one must first identify an ‘urban center’ based on a minimum total population of 50,000 and additional density requirements. Then, surrounding centers are added to the FUA based on a combination of population size and commuting flow requirements. Following this definition, 828 FUAs are identified in the EU, Switzerland, Iceland and Norway, with Paris being the largest with a population of over 13 million inhabitants.

While created to facilitate comparison, city definitions based on fixed thresholds on population, density and so on, such as FUA, address the problem of city delineation as a top-down, ‘one-size-fits-all’ approach, which might imply, for instance, that the capital of a small country such as Liechtenstein is not even considered a city. The notion of FUA also requires access to multiple data sources such as residential and nighttime population, and commuting flows, which might further hinder the wide application of such an approach to city delineation. What is lacking then is a systematic methodology for determining parametric thresholds (for example, total population, population density and volume of commuting flows) to be used in combination with universally accessible datasets.

Recent advances

With the advent and availability of granular geospatial data, new proposals and techniques of city delineation have emerged^25,26. They mostly fall into two categories, which we summarize as form-based and function-based delineations.

Form-based approaches start by classifying the basic spatial unit (for example, a geographic grid) as urban or non-urban using population, buildings, built-up area or nighttime light intensity, and then define cities as continuous urban units^27,28,29,30. For example, it has been suggested that a city is defined as a continuous cluster of more than X inhabitants and/or Y nighttime light intensities. The choice of X and Y, however, varies considerably among the different studies. Even when the choice is made using statistical thresholds²⁹, the arbitrariness is reduced but not eliminated.

Function-based approaches start from the observation that areas through which a notable fraction of people move within a given time should be considered part of the same city. In this case, the definition of a city as a network of socioeconomic relationships can be proxied by flows of people and goods within a certain space. This approach is especially relevant to the complex network community, which seeks to define and characterize cities as evolving networks of exchange and interaction^17,18. Examples of function-based delineations include megaregions in the United States identified by commuting flows³¹, metropolitan areas in Switzerland³² and functional areas in China^33,34,35. As mentioned earlier, whenever the notion of flow is involved, data can become a serious limitation.

Notably, the availability of remote-sensing imagery coupled with increased computational power has led to the emergence of many land classification studies in recent years³⁶. These studies typically use supervised machine learning models to perform classification of remote-sensing data and label pixels of built-up areas as an important land-use class^37,38. Although land classification uses so-called ground truth to determine whether a given pixel is urban (for example, buildings and roads) or not (for example, forest and rivers), deriving a delineation of the city as a whole from this information requires additional analytical steps. For example, while the pixels of built-up areas identified by remote-sensing imagery need to be aggregated to define a city, we must also determine how much of the contiguous built-up area is a city, and whether parks and green spaces, which are often classified as non-built-up areas, are part of the city. More importantly, satellite imagery mainly captures the physical and built environment of cities, and it is difficult to estimate the movement of people in the city and the resulting socioeconomic changes, which are arguably more critical to defining cities.

The drawbacks of existing city definitions based on administrative boundaries or solely on morphological or flow criteria are summarized in Box 1.

Box 1 Drawbacks of existing city definitions

(1) There is a multiplicity of levels of administrative boundaries from which it is hard to choose, and what each of these levels captures differ across countries and even within countries.

(2) Both form-based and function-based approaches require some arbitrary parameters.

(3) Function-based definitions require data that were not readily available before, especially for developing countries.

(4) Given the above points, it is very difficult to perform cross-regional and cross-country comparisons between cities based on existing definitions.

Requisites of a good definition

To allow for comparative and universal analysis, we advance some criteria for defining cities that could be widely accepted by the urban science community. On the basis of the above discussion, and building on previous work^39,40, we propose six criteria, as shown in Box 2.

The first criterion states that the principles on which a city definition is based should be easy to understand by a non-specialized audience. Also, the data processing and implementation procedures used to empirically test this definition should be easily replicable, so that a definition can be used in different contexts and by people with diverse backgrounds and know-how. Note that it is quite common that a conceptually clean definition translates into highly non-trivial data processing and implementation issues. In such cases, these non-trivial procedures should be made open access and adequately documented to facilitate use for as wide an audience as possible.

The second criterion describes the desirable characteristics of the dataset(s) on which a city definition should be based. By universal, we mean that the type of data used to ground a definition should not be specific to or unique to a particular geography or country, but should, to the maximum extent possible, be available in all cities. For example, considering data related to the movement of people in the city, a dataset pertaining to a specific travel mode would be useful only for the definition of cities where such a specific travel mode exists.

Besides being universal, the datasets that can be used to delineate a city should be relatively easy to access. Ideally, one should use data that are publicly available. Such wide accessibility would allow any researcher, scholar or practitioner to collect and apply the necessary data to delineate a city within its geography of interest. However, publicly accessible data are typically highly aggregated for reasons related to storage and/or computing costs as well as privacy concerns, and do not have the granularity necessary for use in quantitative city definition. Data with higher granularity require extensive resources for storage and processing, are highly valuable to the entity that generated or acquired them, and may raise high privacy concerns. For these reasons, access to highly granular data is often restricted, and getting access to them requires signing non-disclosure agreements and/or paying data access fees. Access to data could hence become a major challenge in the quest for a universal, quantitative city definition—a topic that will be discussed further in this paper.

The third property argues for the need to consider both formal and functional features of a city. Clearly, how these features are incorporated into a city delineation depends on which aspects of a city we are interested in analyzing. Taking as an example the Greater Bay Area in China, being interested in cities as labor markets might suggest we focus on, say, ten spatial units (cities) in that area. In contrast, an analysis focused on clusters of innovative activity might isolate only three places (Hong Kong, Shenzhen and Guangzhou) within the same area. Thus, the form and functional features to be considered in the city delineations required for these two exemplary analyses should be defined accordingly.

The fourth property points to the need to carefully select parameters and thresholds in defining a city. Arbitrary parameter selection would de facto impair the ability to compare cities across regions and countries and to address global research questions related to urbanization. For example, the UN Statistical Commission suggests defining cities based on a total population of at least 50,000 and a minimum population density threshold¹². However, in the United States, the criteria for defining a city based on population vary by state, with some states, such as Massachusetts, requiring at least 12,000 inhabitants, and others, such as Iowa, having no minimum population requirement at all.

The UN approach¹² uses an absolute definition, meaning that a human settlement would be classified in the same way across space and time. The US cases are relative definitions that identify cities in a context-dependent manner and thus vary across space and time. While absolute definitions have many advantages, such as simplicity, stability and ease of implementation, we must recognize that urbanization is a dynamic process and varies greatly from country to country (a settlement of 5,000 people in the Congo is nowhere close to a settlement of 5,000 people in, say, the Netherlands). We, therefore, recommend the relative definition or a combination of relative and absolute definitions, with fixed parameters within a country and different parameters for different countries. In such settings, the use of context-specific thresholds should be carefully controlled, and, if possible, it might be useful to establish data-driven methods to set parameters and thresholds used in a city delineation. (An example of such a parameter-free method used for computing thresholds to define city hotspots from cell-phone data was proposed in ref. ⁴¹. More will be discussed in the next section).

The fifth property highlights the importance of seeking extrinsic assessment methods for a city definition. This property is especially important and challenging. One possibility would be to evaluate a city definition both qualitatively and by checking whether some fundamental properties observed in multiple metropolitan regions of the world also hold for the ‘cities’ resulting from the specific city definition at hand. For example, recent studies of urban scaling laws have found that many fundamental scaling properties hold only within specific functional city boundaries^42,43. As shown in Box 3, we suggest first computing the population and area of a ‘city’ as defined by the specific definition at hand, and then analyzing the properties of the ‘city’ population or area distribution over a larger geography (for example, a country).

Lastly, depending on the case under study, administrative boundaries are inescapable, as usual statistical indicators (for example, population, economic or political indices) are generally collected at the administrative level. This means that matching with existent political boundaries would allow the analyst to have access to much data available, for instance, through census data. However, the benefit of this additional source of information heavily depends on the granularity of the smallest spatial unit used to aggregate such information, which can range from a few blocks in a dense urban area in the United States to hundreds of square kilometers for countries such as Colombia. It should also be noted that administrative boundaries do have a critical impact on cities, given the role they play in ‘acting’ on the city through planning and policy-making—thus, the administrative boundaries drive or condition the presence and flows of people, as well as the city’s form and functions. Therefore, in such cases where socioeconomic–political indicators are relevant, a good definition of a city should not overlook the role of administrative boundaries. However, the delineation of administrative boundaries could be updated to better reflect the actual formal and functional characteristics of an urban area as defined by a good city delineation strategy. What we are looking for is a delineation that attempts to find the best fit between a purely data-driven delineation and existing spatial boundaries.

Box 2 Criteria for good city definitions

(1) A city definition should be easy to understand and easy to apply by researchers and practitioners in the field.

(2) The data on which a city definition is built upon must be universal, easy to access and self-explanatory.

(3) It should consider both urban form and function, which are the essential elements of a city.

(4) A definition should be based on careful control of the involved parameters. Parameter choices should be clearly motivated and, where possible, data driven, to avoid arbitrary choices. In some cases, data-driven parameter setting might render them endogenous, thus making the city definition parameter free.

(5) The city delineations resulting from a definition should seek some form of validation based on a procedure that should be general and widely accepted by the urban science community. Preferably, this procedure should include one or more quantitative criteria to assess the quality of a definition.

(6) Depending on the case studied, city delineations resulting from a definition might be later combined with administrative boundaries.

Box 3 A feasible process for city delineation with cell-phone data

(1) Population density can be calculated from cell-phone data at the grid cell level (a grid of about 1 km is recommended, taking into account the resolution of cell-phone data and the scale of people’s activities at the neighborhood level).

(2) Flow of people between grid cells can also be derived from cell-phone data (Fig. 1a).

(3) Aggregate grid cells with population densities and flows above certain thresholds. One possible approach to set thresholds is to exploit the critical characteristics of the urban system, which could make these two thresholds endogenous and could be determined from the statistics or the phase diagram (that is, natural discontinuities). Another approach is goal oriented, for instance, if the goal is to make the output as close as possible to a city delineation with desired features, such as reproducing FUA boundaries.

(4) Multiple dimensional validations (for example, statistical characteristics in urban systems, similarity to existing metropolitan boundaries in terms of morphology, population size and so on).

Opportunities from using cell-phone data

We propose cell-phone data as the basic dataset in finding a solution to the quest for standardized city definitions. Cell-phone data typically come in one of two forms: call detail records (CDR) or global positioning system (GPS) records. CDR reports geolocate events such as sending a text, making or receiving a call, and so on. The record typically contains at least an (anonymized) user ID, the time of the event and the ID of the cell tower to which the user is connected at the time of the event. GPS records provide more detailed spatiotemporal information, typically reporting at least the coordinates (latitude and longitude) of the data point, the time and possibly other information, such as speed, heading and so on. Note that we mainly consider the geospatial attributes of cell-phone data, which have been widely used in the study of human activity and urban science in recent decades^44,45. There are also a few studies that use cell-phone communication information to study social networks⁴⁶. However, compared with location attributes, individual-level cell-phone data with rich semantics are very difficult to access, so the cell-phone data used in this paper mainly refer to cell-phone location data.

The main difference between the two types of cell-phone dataset is in the spatial and temporal resolution of the information—as shown in Fig. 1. In the case of CDR, the spatial resolution is determined by the coverage area of the cell tower, which ranges between a few hundred meters in densely populated areas to a few tens of square kilometers in low-density areas—with a tendency of reducing cell coverage area with newer cellular communication technologies such as 5G. The temporal resolution for CDR data depends on the frequency of recorded events, which in turn is determined by how frequently the user uses the phone for texting, calling, browsing the Internet and so on. GPS data have a much higher accuracy in both time and space. Spatial accuracy is typically in the range of a few tens of meters, while the temporal recording of events is typically quite regular, with sampling intervals ranging from a few seconds to a few minutes.

**Fig. 1: Cell-phone data and process methods.**

It is estimated that about 97% of the world’s population is covered by a cellular network. Owing to this ubiquity and the worldwide availability of the Global System for Mobile communication standard, cell-phone data provide a consistent collection mechanism across the world and are independent of national censuses. This meets the requirements of the second point described in Box 2. However, access to cell-phone data is not easy, which might challenge the requirement for easy access to data described in Box 2. The issue of access to cell-phone datasets is further discussed at the end of this paper.

Importantly, when it comes to city definition and delineation, cell-phone data offer the ability to capture both form features (such as nighttime and daytime presence of people in an area) and functional features (such as mobility flows between two areas)—as shown in Fig. 1. With well-established methods for estimating stay points and origin–destination flows from trajectories⁴⁷, cell-phone data can be used to compute both high-resolution population distributions and commuting flows, becoming a valid alternative to classical labor market approaches that provide a satisfactory answer to only the second point. Cell-phone data can be used to estimate flows in the city more generally, allowing for an accurate picture of movement patterns within a given geographical area, opening up the possibility of defining functional cities with greater granularity.

Recent research based on cell-phone data and urban science has also shed additional light on the fourth and fifth points described in Box 2, showing the potential of using cell-phone data to uncover foundational features of human movement that can provide a quantitative ground for city delineation. For instance, Alessandretti et al.⁴⁸ discovered that humans tend to perform a constant number of weekly movements across different geographies, while Schläpfer et al.⁴⁹ unveiled an inverse square law between distance and frequency of trips. Other works following this line of research are by refs. ^50,51.

Moreover, as the result of a process with self-organization characteristics, the city system has been found to exhibit discontinuous changes—phase transitions^17,52. One transition example was provided by the hierarchical percolation on the British road network and on its intersections, which show abrupt changes at critical parameter values⁵³. The clusters defined at these discontinuous thresholds were found to be in good agreement with city boundaries estimated from satellite imagery. Similar percolation phenomena have been found in city systems in many parts of the world^54,55. These characteristics allow us to define cities using endogenous parameters—determined by the thresholds of the system, referred to as ‘critical’, which mark a phase transition, making the thresholds in the definition process obtained from the data, rather than purely arbitrary settings (the fourth point in Box 2). It is worth noting that the commonly used sensitivity analysis does not necessarily yield effective parameter thresholds. This is because the result can vary greatly when the threshold exceeds the phase-transition point⁵³, and so this is often not considered a good threshold in sensitivity analysis.

In practice, the variability of form and function of cities in different regions, and the quality of the data itself, make it not always possible to find such discontinuities⁵⁶. Even with critical parameter values, we still need a set of metrics to evaluate whether the city boundaries derived from cell-phone data are reasonable (the fifth point in Box 2). In particular, Zipf’s law has been used in previous work as evidence of good delineation²⁷. Yet recent work⁵⁷ has shown that Zipf’s law can be problematic for city definition as the law’s exponents fluctuate considerably in empirical data^58,59. With cell-phone data, we can do far more than Zipf’s law to validate a city delineation. For example, we can examine the allometric urban scaling laws Y ≈ N^β, where N is the population in each city cluster and Y is the variable of area, interaction and output of each city cluster⁶⁰. The scaling exponent β can be estimated from cell-phone data and can be used to test whether the exponent in the boundary quantified by cell-phone data is consistent with the theoretical prediction⁴⁰.

Returning to the global comparison, cell-phone data and the aforementioned characteristics of city systems provide an unprecedented opportunity to consider both absolute and relative city definitions. The consistency and universality of cell-phone data makes it easier to quantify cities using absolute thresholds. And the granularity of cell-phone data also allows us to set a ‘relative absolute threshold’. For example, one can take the 95th percentile of distance to work and define local labor markets and corresponding city boundaries. The methodology for doing this remains to be developed given the continuous nature of the data (Fig. 2 shows one possible way to find such thresholds). Yet a great advantage here is that the distance bands obtained by these statistical thresholds (absolute definition) will differ across countries (relative definition).

**Fig. 2: An illustration delineating cities using cell-phone data.**

As a summary, in Box 3 and Fig. 2, we outline a feasible process for city delineation with cell-phone data. First, we use cell-phone data to calculate the population density within a grid and merge continuous high-density grids into clusters. Second, also using cell-phone data, we calculate the commuting flows between clusters and further aggregate clusters with flows above a certain threshold. This step of the definition is similar to the FUAs, which also require two parameters: population density and commuting ratio. Third, the critical characteristics of the city system make these two parameters endogenous and can be determined from the phase diagram. After the above steps, we can obtain high-density and tightly connected city clusters. Fourth, the derived city clusters can be validated by using a set of criteria (for example, scaling laws, similarity to the existing metropolitan boundaries). Recall the last point in Box 2, to make the results more useful in practice, we can further aggregate the clusters at the administrative level.

Conclusions

To summarize, in this Perspective, we have extensively discussed why we believe that the urban science community should carefully consider the problem of how to define cities and metropolitan areas, and have proposed a set of guidelines and widely accepted methods for delineating cities. We are not advocating here for a single definition, as we believe that different ones will be needed depending on the specific problem and research question being analyzed. Rather, we advocate for the use of delineations that conform to the conceptual framework described in Box 2, so that, regardless of the specific definition used for the problem at hand, the requirement of universality, leveraging standard and widely available data is still satisfied. This approach would facilitate the replicability of studies and comparisons across different geographies, bringing invaluable benefits to the research community.

We have also advocated the use of cell-phone data as a proxy for the essential features of any city definition: urban form and function. Cell-phone datasets are not the only example of standardized data that can be used for this purpose. If needed, the integration of cell-phone datasets with point-of-interest information, global road network repositories such as OpenStreetMap, social network data, remote-sensing imagery and so on could provide additional opportunities for city delineation. However, a challenge in this context is that some of these complementary datasets may not be globally available in the same format.

It is important to note here that the spatial and temporal resolution of the data is a critical factor when considering access to data that is related to human ethics and privacy protection. While gaining access to individual-level GPS trajectories might be extremely difficult, this level of detail is often not necessary in the process of delineating city boundaries. For this purpose, access to spatially and temporally aggregated data (for example, how many people moved from area A to area B in one day) is often sufficient. Generally speaking, the coarser the level of spatiotemporal resolution, the easier it is to gain access to a dataset. Thus, the question of which is the minimal resolution needed in the context of city delineation is a prominent question that has mostly been evaded so far in the literature. There is no single answer to this question since, as extensively argued in this paper, how to define a city depends on the context and scope of the study, and so does the minimum data resolution needed to support a specific city definition. Similarly, the data aggregation level and chosen spatial unit should be suited to the context and scope of the study.

Cell-phone datasets come with limitations related to the fact that although cell-phone ownership is wide, some socioeconomic groups tend to be overrepresented (younger, wealthier populations) or underrepresented (older populations). If we could have a globally covered functional city dataset, scholars and policymakers in a variety of fields such as urban planning, transportation, epidemiology and sustainability would benefit greatly. Yet, despite caveats and the uneven penetration of smartphones, the more than 18 billion cell phones projected to be in use globally by 2025 are arguably the most widely spread proxy for the presence and flows of people.

Global access to cell-phone data is a challenge: even if an increasing number of open cell-phone datasets have been made available recently (such as the Data for Development (D4D) challenge, Uber Movement (https://movement.uber.com/) and Baidu Migration (https://qianxi.baidu.com/)), the data in general are owned by private companies, protected by non-disclosure agreements or access fees, subject to strict privacy regulations and so on. In this regard, we would encourage the urban science community to join forces to develop satisfactory definitions of cities based on cell-phone data or other standardized, global datasets, and to work with international organizations, such as the International Telecommunication Union, to make anonymized cell-phone data available more broadly, as a clear delineation of cities is critical to addressing current and future urban problems affecting populations globally. Collecting, pooling and analyzing such data on a global scale requires nothing less than a collective effort, which we all call for.

References

Bairoch, P. Cities and Economic Development: From the Dawn of History to the Present (Univ. Chicago Press, 1988).
Chandler, T. & Fox, G. Three Thousand Years of Urban Growth (Academic Press, 1974).
Gottmann, J. Megalopolis: The Urbanized Northeastern Seaboard of the United States (Twentieth Century Fund, 1961).
Mumford, L. What is a city? Archit. Rec. 82, 93–96 (1937).
Google Scholar
Varzi, A. C. What is a city? Topoi 40, 399–408 (2021).
Article Google Scholar
Berry, B. J. The impact of expanding metropolitan communities upon the central place hierarchy. Ann. Assoc. Am. Geogr. 50, 112–116 (1960).
Article Google Scholar
Berry, B. J., Goheen, P. G. & Goldstein, H. Metropolitan Area Definition: A Re-evaluation of Concept and Statistical Practice Vol. 28 (US Bureau of the Census, 1969).
US Office of Management and Budget 2010 standards for delineating metropolitan and micropolitan statistical areas. Fed. Regist. 75, 37246–37252 (2010).
Adams, J. S., VanDrasek, B. J. & Phillips, E. G. Metropolitan area definition in the United States. Urban Geogr. 20, 695–726 (1999).
Article Google Scholar
What is a City? Technical Report (UN-Habitat, 2020).
Dijkstra, L. et al. Applying the degree of urbanisation to the globe: a new harmonised definition reveals a different picture of global urbanisation. J. Urban Econ. 125, 103312 (2021).
A Recommendation on the Method to Delineate Cities, Urban and Rural Areas for International Statistical Comparisons. Technical Report (UN Statistical Commission, 2020).
Berry, B. J. Cities as systems within systems of cities. Pap. Reg. Sci. 13, 147–163 (1964).
Article Google Scholar
Henderson, J. V. The sizes and types of cities. Am. Econ. Rev. 64, 640–656 (1974).
Google Scholar
Pumain, D. Scaling laws and urban systems. Working paper (Santa Fe Institute, 2004).
Barthelemy, M. The Structure and Dynamics of Cities (Cambridge Univ. Press, 2016).
Batty, M. The New Science of Cities (MIT Press, 2013).
Bettencourt, L. M. Introduction to Urban Science: Evidence and Theory of Cities as Complex Systems (MIT Press, 2021).
Duranton, G. & Puga, D. Urban growth and its aggregate implications. Econometrica 91, 2219–2259 (2023).
Caldarelli, G. et al. The role of complexity for digital twins of cities. Nat. Comput. Sci. 3, 374–381 (2023).
National Bureau of Statistics of China China City Statistical Yearbook (China Statistics Press, 2022).
Chen, Z. & Yeh, A. G.-O. Is prefecture-level city a “city” in China: a critical review. Eurasian Geogr. Econ. https://doi.org/10.1080/15387216.2023.2267064 (2023).
Cheshire, P. C. & Hay, D. G. Urban Problems in Western Europe: An Economic Analysis (Unwin Hyman, 1989).
Dijkstra, L., Poelman, H. & Veneri, P. The EU-OECD Definition of a Functional Urban Area Regional Development Working Papers (OECD, 2019).
Duranton, G. Classifying locations and delineating space: an introduction. J. Urban Econ. 125, 103353 (2021).
Moreno-Monroy, A. I., Schiavina, M. & Veneri, P. Metropolitan areas in the world. Delineation and population trends. J. Urban Econ. 125, 103242 (2021).
Article Google Scholar
Rozenfeld, H. D., Rybski, D., Gabaix, X. & Makse, H. A. The area and population of cities: new insights from a different perspective on cities. Am. Econ. Rev. 101, 2205–2225 (2011).
Dingel, J. I., Miscio, A. & Davis, D. R. Cities, lights, and skills in developing economies. J. Urban Econ. 125, 103174 (2021).
Article Google Scholar
De Bellefon, M.-P., Combes, P.-P., Duranton, G., Gobillon, L. & Gorin, C. Delineating urban areas using building density. J. Urban Econ. 125, 103226 (2021).
Article Google Scholar
Tannier, C., Thomas, I., Vuidel, G. & Frankhauser, P. A fractal approach to identifying urban boundaries. Geogr. Anal. 43, 211–227 (2011).
Article Google Scholar
Nelson, G. D. & Rae, A. An economic geography of the United States: from commutes to megaregions. PLoS One 11, e0166083 (2016).
Dessemontet, P., Kaufmann, V. & Jemelin, C. Switzerland as a single metropolitan area? A study of its commuting network. Urban Stud. 47, 2785–2802 (2010).
Article Google Scholar
Ma, S. & Long, Y. Functional urban area delineations of cities on the Chinese mainland using massive Didi ride-hailing records. Cities 97, 102532 (2020).
Article Google Scholar
Chen, Z. & Yeh, A. G.-O. Delineating functional urban areas in Chinese mega city regions using fine-grained population data and cellphone location data: a case of Pearl River Delta. Comput. Environ. Urban Syst. 93, 101771 (2022).
Chen, T., Gu, Y. & Zou, B. Delineating China’s metropolitan areas using commuting flow data. Preprint at SSRN https://ssrn.com/abstract=4052749 (2023).
Liu, X. et al. High-spatiotemporal-resolution mapping of global urban change from 1985 to 2015. Nat. Sustain. 3, 564–570 (2020).
Article Google Scholar
Baragwanath, K., Goldblatt, R., Hanson, G. & Khandelwal, A. K. Detecting urban markets with satellite imagery: an application to India. J. Urban Econ. 125, 103173 (2021).
Galdo, V., Li, Y. & Rama, M. Identifying urban areas by combining human judgment and machine learning: an application to India. J. Urban Econ. 125, 103229 (2021).
Duranton, G. A proposal to delineate metropolitan areas in Colombia. Desarro. Soc. 75, 223–264 (2015).
Cao, W. et al. Constructing multi-level urban clusters based on population distributions and interactions. Comput. Environ. Urban Syst. 99, 101897 (2023).
Article Google Scholar
Louail, T. et al. From mobile phone data to the spatial structure of cities. Sci. Rep. 4, 5276 (2014).
Article Google Scholar
Rosen, K. T. & Resnick, M. The size distribution of cities: an examination of the Pareto law and primacy. J. Urban Econ. 8, 165–186 (1980).
Stier, A. J. et al. Reply to Huth et al.: Cities are defined by their spatially aggregated socioeconomic networks. Proc. Natl Acad. Sci. USA 119, e2119313118 (2022).
Article Google Scholar
Blondel, V. D., Decuyper, A. & Krings, G. A survey of results on mobile phone datasets analysis. EPJ Data Sci. 4, 10 (2015).
Pappalardo, L., Manley, E., Sekara, V. & Alessandretti, L. Future directions in human mobility science. Nat. Comput. Sci. 3, 588–600 (2023).
Article Google Scholar
Park, P. S., Blumenstock, J. E. & Macy, M. W. The strength of long-range ties in population-scale social networks. Science 362, 1410–1413 (2018).
Article Google Scholar
Alexander, L., Jiang, S., Murga, M. & González, M. C. Origin–destination trips by purpose and time of day inferred from mobile phone data. Transp. Res. Part C 58, 240–250 (2015).
Article Google Scholar
Alessandretti, L., Sapiezynski, P., Sekara, V., Lehmann, S. & Baronchelli, A. Evidence for a conserved quantity in human mobility. Nat. Hum. Behav. 2, 485–491 (2018).
Article Google Scholar
Schläpfer, M. et al. The universal visitation law of human mobility. Nature 593, 522–527 (2021).
Article Google Scholar
Grauwin, S. et al. Identifying and modeling the structural discontinuities of human interactions. Sci. Rep. 7, 46677 (2017).
Article Google Scholar
Alessandretti, L., Aslak, U. & Lehmann, S. The scales of human mobility. Nature 587, 402–407 (2020).
Article Google Scholar
Barthelemy, M. The statistical physics of cities. Nat. Rev. Phys. 1, 406–415 (2019).
Article Google Scholar
Arcaute, E. et al. Cities and regions in Britain through hierarchical percolation. R. Soc. Open Sci. 3, 150691 (2016).
Article Google Scholar
Cao, W., Dong, L., Wu, L. & Liu, Y. Quantifying urban areas with multi-source data based on percolation theory. Remote Sens. Environ. 241, 111730 (2020).
Article Google Scholar
Montero, G., Tannier, C. & Thomas, I. Delineation of cities based on scaling properties of urban patterns: a comparison of three methods. Int. J. Geogr. Inf. Sci. 35, 919–947 (2021).
Article Google Scholar
Bosker, M., Park, J. & Roberts, M. Definition matters. Metropolitan areas and agglomeration economies in a large-developing country. J. Urban Econ. 125, 103275 (2021).
Article Google Scholar
Verbavatz, V. & Barthelemy, M. The growth equation of cities. Nature 587, 397–401 (2020).
Article Google Scholar
Cura, R. et al. The old and the new: qualifying city systems in the world with classical models and new data. Geogr. Anal. 49, 363–386 (2017).
Article Google Scholar
Cristelli, M., Batty, M. & Pietronero, L. There is more than a power law in Zipf. Sci. Rep. 2, 812 (2012).
Article Google Scholar
West, G. Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies, and Companies (Penguin, 2017).

Download references

Acknowledgements

L.D. and Y.L. acknowledge the support of the National Natural Science Foundation of China (No. 41830645). M. Batty acknowledges support from the Alan Turing Institute under QUANT2-Contract-CID-381581. G.B.W. acknowledges the generous support of Toby Shannon via the Charities Aid Foundation (CAF) of Canada and of the NSF under grant no. PHY1838420. F.D., P.S. and C.R. acknowledge all members of MIT Senseable City Lab consortium.

Author information

Authors and Affiliations

Institute of Remote Sensing and Geographic Information Systems, School of Earth and Space Sciences, Peking University, Beijing, China
Lei Dong & Yu Liu
Senseable City Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Fabio Duarte, Paolo Santi & Carlo Ratti
University of Pennsylvania, Philadelphia, PA, USA
Gilles Duranton & Gary Hack
The Institute of Informatics and Telematics CNR, Pisa, Italy
Paolo Santi
Université Paris-Saclay, CNRS, CEA, Institut de Physique Théorique, Gif-sur-Yvette, France
Marc Barthelemy & Vincent Verbavatz
Centre d’Analyse et de Mathématique Sociales (CNRS/EHESS), Paris, France
Marc Barthelemy
University College London, London, UK
Michael Batty
University of Chicago, Chicago, IL, USA
Luís Bettencourt
University of California Santa Barbara, Santa Barbara, CA, USA
Michael Goodchild
Massachusetts Institute of Technology, Cambridge, MA, USA
Gary Hack
Université Paris 1 Panthéon-Sorbonne, Paris, France
Denise Pumain
Hong Kong Polytechnic University, Hong Kong SAR, China
Wenzhong Shi
Santa Fe Institute, Santa Fe, NM, USA
Geoffrey B. West
The University of Hong Kong, Hong Kong SAR, China
Anthony G. O. Yeh

Authors

Lei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Duranton
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Santi
View author publications
You can also search for this author in PubMed Google Scholar
Marc Barthelemy
View author publications
You can also search for this author in PubMed Google Scholar
Michael Batty
View author publications
You can also search for this author in PubMed Google Scholar
Luís Bettencourt
View author publications
You can also search for this author in PubMed Google Scholar
Michael Goodchild
View author publications
You can also search for this author in PubMed Google Scholar
Gary Hack
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Denise Pumain
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Verbavatz
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey B. West
View author publications
You can also search for this author in PubMed Google Scholar
Anthony G. O. Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Ratti
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors participated in the discussion and writing of the paper.

Corresponding author

Correspondence to Paolo Santi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Cities thanks Lewis Dijkstra, Mark Montgomery and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dong, L., Duarte, F., Duranton, G. et al. Defining a city — delineating urban areas using cell-phone data. Nat Cities 1, 117–125 (2024). https://doi.org/10.1038/s44284-023-00019-z

Download citation

Received: 26 June 2023
Accepted: 04 December 2023
Published: 11 January 2024
Issue Date: February 2024
DOI: https://doi.org/10.1038/s44284-023-00019-z