Global urban growth between 1870 and 2100 from integrated high resolution mapped data and urban dynamic modeling

Long term, global records of urban extent can help evaluate environmental impacts of anthropogenic activities. Remotely sensed observations can provide insights into historical urban dynamics, but only during the satellite era. Here, we develop a 1 km resolution global dataset of annual urban dynamics between 1870 and 2100 using an urban cellular automata model trained on satellite observations of urban extent between 1992 and 2013. Hindcast (1870–1990) and projected (2020–2100) urban dynamics under the five Shared Socioeconomic Pathways (SSPs) were modeled. We find that global urban growth under SSP5, the fossil-fuelled development scenario, was largest with a greater than 40-fold increase in urban extent since 1870. The high resolution dataset captures grid level urban sprawl over 200 years, which can provide insights into the urbanization life cycle of cities and help assess long-term environmental impacts of urbanization and human–environment interactions at a global scale. Projected urban sprawl and environmental impacts of urbanization are greatest under a fossil-fuelled development shared socioeconomic pathway, according to global, high resolution urban cellular automata modelling.

demand, which essentially characterizes a linear relationship between per capital urban area and socioeconomic variables (e.g., per capital GDP and the urbanization rate) of all spatial units 26 . Such a relationship could be too simple to capture discrepancies of urban demand in different urbanization stages and differences of urban sprawl pathways in different regions [42][43][44] . Fourth, recent studies found that the newly developed lands play a more important role in urban sprawl compared to the early urbanized areas 37 . Thus, an urban sprawl model with a spatially explicit consideration of the temporal effect of urbanized pixels can better capture the complex urban sprawl at the global scale. Finally, production of global urban extent dynamics spanning hundred years is still lacking, although such a dataset is of great importance to global environmental change studies.
In this paper, we developed a modeling framework to hindcast and project global urban sprawl at a 1 km resolution from 1870 to 2100. First, we calibrated a global urban CA (i.e., the Logistic-Trend-CA) model with the consideration of the temporal effect of urbanized pixels using a longer than two decades of urban extent dataset from nighttime light (NTL) satellite observations. Next, we hindcasted urban shrink from 1992 back to 1870 using the calibrated Logistic-Trend-CA model. Similarly, we projected future global urban sprawl until 2100 under the five shared socioeconomic pathways (SSPs) scenarios 45 . The SSPs describe five alternative ways in which societal factors such as demographics, human development (for example, health and education), economic growth, inequality, governance, technological change, and policy orientations might evolve in the future 45 . Finally, we combined urban extent from observations, hindcast, and projection and generated the long-term dataset of urban extent from 1870 to 2100.

Results
Long-term dynamics of global urban extent. To the best of our knowledge, our product of urban extent for the first time presents a view of urban sprawl across the world for more than 200 years (Fig. 1). Temporally, the growth rate of global urban extent during the hindcast period is 3,230 km 2 per year, which is about one sixth (20,000 km 2 ) of the growth rate in the historical period (1992-2013) observed by satellites. Under five SSPs, the growth rates of global urban extent fall into the range of 10,000-240,000 km 2 per year. Urban sprawl under SSP3 (regional rivalry) and SSP4 (inequality) is notably slower than that in the past two decades, while urban sprawl under SSP5 (fossil-fueled development) is higher 44,45 . Under SSP2 (middle of the road), there is a distinctive shift of urban sprawl hotspots from North America and Europe in earlier periods (i.e., before 1990s) to Asia and Africa in the future (i.e., after 2050s), particularly in China and India as well as other countries in west Africa. The presented spatiotemporal dynamics of global urban extent are consistent with the findings reported in the "World Urbanization Prospects" 46 . Both developing countries in Asia and Africa and developed countries in the North America and Europe will experience notable urban growth under SSP5 due to the projected rapid economic development in this scenario. It is worth to note that the harmonization between HYDE and NTL derived urban extents was conducted based on observations in 1992, which led to a slightly abrupt change around 1990 in the overall trend.
Urban area changes at the continental scale. Our results of urban extents, from hindcast, projection, and remote sensing observations, provide a continuous and harmonized record of global urban dynamics spanning from the 1870s to 2100 (Fig. 2). A notable difference in urban growth patterns across continents was observed. Urban area is largest in Asia among all continents ARTICLE COMMUNICATIONS EARTH & ENVIRONMENT | https://doi.org/10.1038/s43247-021-00273-w in 2010, and this trend will continue although urban area will grow in all continents through 2100. Africa will be a new engine of urban growth in the second half of the century (i.e., after 2050) in SSP1, SSP2, and SSP5, though the urban proportion of Africa is relatively low 17 . In other continents (e.g., North America, South America, Europe, and Australia), there is moderate urban growth with a plateau stage albeit there are variations across SSPs. In general, our projection of urban extents under SSP2 (i.e., Middle of the Road) is in agreement with the United Nations' projections 46 , which projected that almost 90% of urban population growth would likely occur in Asia and Africa by 2050.
Changes in urban extent at the country scale. Our results of urban extent with a temporal span of more than 200 years reveal Fig. 1 An overview of the long-term dynamics of global urban extent. Change of global urban extent from 1870 to 2100 under five SSPs (a). Urban extent in representative years of historical and SSP2 scenario (i.e., Middle of the Road) (b). The urban extent was aggregated from 1 km to 1 degree as a percentage as represented in the maps. Maps in b were generated by ArcGIS Pro software (https://www.esri.com/en-us/arcgis/products/arcgis-pro/ overview).  pathways of urbanization across countries (Fig. 3). As such, the life cycle of urbanization (or city development) can be approximately characterized. It is crucial to consider the stage (i.e., initial, middle, and mature) of urbanization when analyzing the trend of urban growth. For example, as observed by satellites, the US is a developed country with a relatively low pace of urban growth 22,26,41,42,46 . From its pathway of urbanization during 1900-1990, we found a significant growth of urban areas (i.e., middle stage of the urbanization process) ( Supplementary Fig. 1). Such urban growth is similar to that in China over the past two decades ( Supplementary Fig. 1). Country-specific trends of urban growth can be derived from our product and can be used to understand their different stages of urbanization.
In general, among the five SSPs, the growth of urban extent under SSP5 (fossil-fueled development) and SSP4 (inequality) is the highest and lowest, respectively, at the global scale ( Supplementary Fig. 2). The primary difference among these SSPs is their trends of future population and the per capita urban area 44 . For example, the two most populated countries in Asia, China and India, exhibit notably different trends of urban growth due to the different population and per capita urban area pathways in the future. That is, the urban growth in China is anticipated to reach a plateau after 2050, while in India, urban growth under all scenarios show consistent trends of increase ( Supplementary Fig. 1), which is driven by the continuous increases of both population and per capita urban area in India 44 . In some countries in the middle of Africa, future urban extent is still low because these countries are at the initial stage of urbanization before 2050 ( Supplementary Fig. 2).
Dynamics of urban extent under different SSPs. Our urban sprawl dataset with a moderate spatial resolution clearly shows the evolution of urban extent over a long period in a spatially explicit way, as shown in a selected example of the Yangtze River Delta in China (Fig. 4), where a significant expansion of urban extent has been observed by satellite observations 17 . Obviously, there was only one isolated urban cluster (i.e., the main city of Shanghai) in the Yangtze River Delta of China in 1870. In the 1900s, there are only small settlements and the urban sprawl was slow. Over past two decades (1990-2010), this region experienced rapid urban sprawl due to the rapid migration of population from rural to urban areas. During this process, cities with different sizes grew and some of them were merged due to the development of traffic networks and the expansion of built-up areas 47 . Under SSPs, the continuous increase of population and economy drives the expansion of cities around Shanghai. The differences of spatial pattern in urban sprawl under the five SSPs are mainly driven by urban demand in these scenarios. Consistent with the growth of urban extent in China across the five SSPs, urban growth is largest under the SSP5, while it is smallest under the SSP4.
The long-term dynamics of urban extent vary across metropolitan areas under the SSPs ( Supplementary Fig. 3). We compared urban sprawl under SSP4 (i.e., inequality), SSP2 (i.e., middle of the road), and SSP5 (i.e., fossil-fueled development), which correspond to low, middle, and high growth rates, respectively 44 , across five metropolitan areas ( Supplementary  Fig. 3). In general, we observed a dramatic growth of urban area from 1870 to 2100 in these five regions. The initially sparely distributed and small urban patches grew and merged, resulting in notably enlarged urban clusters. The growth pattern of urban extent varies significantly across these metropolitan areas ( Supplementary Fig. 3). For metropolitan areas in the US and UK, there is no significant urban growth between 2050 and 2100 under SSPs. However, for other three metropolitan areas in China, Egypt, and Brazil, we observed noticeable urban sprawls during the period of 2050-2100.

Discussion
Information of urban dynamics, especially at the global scale and over a long time period, is of great importance to deepen our understanding of the urbanization process. The life cycle of cities generally spans over multiple decades or even longer, while available observations and records of urban land for cities are limited 43,48 . Hence, existing theories about urban land growth could be limited due to the lack of data. Although satellite observations have been extensively used to monitor the urban environmental change over past decades, global records of urban dynamics are still limited due to challenges such as data availability and computational capacity 49 . Moreover, satellite observations may only cover a short period of the life cycle of cities. These factors hindered the use of temporal contexts in urban growth studies. Urbanization in cities under different stages (i.e., initial, middle, and mature) 44 can be well captured from our longterm dataset of urban extent from 1870 to 2100.
Although there are some consistencies between our results and the other two relevant studies (Chen et al. 26 and Gao and O'Neill 42 ), trends of future global urban sprawl vary under different SSPs and across regions. Both, Chen et al. 26 and Gao and O'Neill 42 modeled global urban land change under the five SSPs scenarios through 2100 (Fig. 5). It is worth noting that urban extent in our study was derived from NTL data, which are not the same as the global human settlement layer used in Chen et al. 26 and Gao and O'Neill 42 , in terms of the spatial distribution and total urban areas. Hence, we compared the ratio between urban extent in the future and base year. We found that there are some consistencies between our results and the other two studies. For example, the global urban area growth under the SSP5 (fossilfueled development) is largest, while the growth under the SSP3 (regional rivalry) is relatively small. However, there are some differences as well. First, global urban areas under different SSPs in Gao and O'Neill 42 are not fully consistent with the urbanization processes observed from existing highly urbanized cities (e.g., the US cities in Supplementary Fig. 4), as reflected in our results. Urban areas in Gao and O'Neill 42 show consistent increases through 2100, especially for SSP5, SSP2, and SSP4. In fact, satellite observations revealed the growth of urban areas in North America and Europe already slowed down over past decades 17 , and this historical trend was well captured in our projection model 44 . Second, our model for urban area estimation is more theoretically based compared to data-driven approaches (e.g., Monte Carlo) in Gao and O'Neill 42 . In our model, the growth of urban areas is driven by economic development, population growth, and the historical pathway of urban area growth captured by the sigmoid-growth model, which was calibrated for each , especially in rapidly developing countries (e.g., China). This is mainly because the spatial differences and temporal dynamics of per capita urban area in different regions were used in the panel data analysis in Chen et al. 26 . As a result, the derived per capita urban area showed small changes over years, which is not fully consistent with satellite derived results (i.e., per capita urban area in China shows a noticeable increment over past decades). Nevertheless, the historical trends with distinct increment of per capital urban area and the growth rate of urban areas in China were considered in our data and Gao and O'Neill 42 . Fourth, it is worth to note that the growth of urban extents under SSPs in current studies may not reflect the narrative as it is in literal meaning. For example, there is a large amount of urban area growth under SSP1 (sustainability) in some rapidly developing countries (e.g., China) because the predictions of future urban area growth only consider horizontal expansion. Due to the availability of urban heights data, especially at the global scale, urban vertical growth, which would result in a compact and sustainable urban development under SSP1 44 , was not considered. Although Gao and O'Neill 42 manually assigned a trajectory with a relatively low urban expansion rate under SSP1 to realize sustainable land use, the vertical growth of urban extent was not considered either, like most studies 23,26,41,42,50 . The absences of global urban height datasets and vertical urban growth models are primary barriers for modeling compact and sustainable urban growth. The advent of regional built-up height dataset 51 provides the possibility to simulate the vertical growth of urban areas in future studies.
The comparison with other publicly available datasets indicates our historical modeling performs well in capturing temporal trends and spatial variations of urban extents ( Supplementary Fig. 6). We selected the US in this experiment using the most available recorded (i.e., historical settlement data compilation; HISDAC-US) 50,52 and modeled (i.e., forecasting scenarios of land use change; FORE-SCE) 53 historical urban extent data. Although the definition and spatial extent of base maps used to generate historical modeling vary among these datasets, the temporal trends of urban area growth relative to 1940 are similar (Supplementary Fig. 6). The relatively large magnitude of urban area and growth rate in HISDAC-US data is mainly attributed to the definition of urban area in each 250 m grid, in which urban area was defined if one record of built-up properties was found. It is worth to note that we regarded the built-up area in HISDAC-US as urban areas for comparison, and many isolated built-up pixels away from cities were not included in FORE-SCE and our data. The temporal trends of urban growth between the FORE-SCE and our results are similar, although the used approaches are notably different. That is, historical trends of urban area in our data mainly came from the HYDE dataset 29 , which is notably different from the estimated growth rates in multiple historical years (i.e., 1973, 1980, 1986, and 1992) using change detection in the FORE-SCE model. In addition, the dynamics of urban extent in our data are relatively consistent with those from the FORE-SCE model and HISDAC-US data, despite their differences in definition and base map. It worthy to note that this comparison was conducted in the US, and more diverse comparisons are required in future to accounting for the uncertainty of historical urban sprawl in other regions 54,55 . The integration of hindcast and projection of global urban dynamics, with a long temporal coverage (i.e., 230 years) and a moderate spatial resolution (1 km), can contribute to a variety of studies that are relevant to urbanization. For the climate change and sustainability communities, such long-term global urban extent dataset can be used as inputs to climate models to investigate the impacts of global urbanization (e.g., urban heat island) on global climate changes 41,56 . The expansion of urban extent also resulted in population redistribution, accompanied by different spatial patterns of emitted anthropogenic heat flux from different sectors (e.g., industry, traffic, building, and human metabolism), further influencing the global carbon cycle and climate change 12,57,58 . Our dataset can also be used to explore the impact of urban expansion on habitat and biodiversity loss 7 , particularly in those rapidly developing regions, where the urban CA model can also be implemented using fine spatial resolution (e.g., 30 m) urban extent data 59,60 . Moreover, this product can serve as important inputs for improving representation of urban dynamics in multisector human-Earth systems models.

Methods
Overall framework. We developed a framework by combining hindcasting and projecting urban sprawl as well as satellite observations to generate a product of global urban dynamics from 1870 (i.e., the second industrial revolution) to 2100 under different SSPs ( Supplementary Fig. 7). There are four components in this framework. The first is data preparation, including the collection of countryspecific socioeconomic data (e.g., population and GDP), spatially explicit proxies (e.g., terrain and traffic), and remotely sensed global urban extent time series 17 ( Supplementary Fig. 7a). Then, we estimated urban demands in two periods (i.e., 1870-1992 and 2013-2100), using socioeconomic data and an urban area growth model ( Supplementary Fig. 7b). After that, we calibrated the Logistic-Trend-CA model 37 and evaluated its performance using satellite derived urban extent from 1992 to 2013 (Supplementary Fig. 7c). Finally, we hindcasted historical (1870-1992) and future scenarios (2013-2100) of global urban extent in a spatially explicit manner and developed the dataset of global urban extent from 1870 to 2100 by integrating hindcasted, satellite-derived, and project urban extent (Supplementary Fig. 7d). Details of each component are presented in the following sections.
Data preparation. We collected country-specific socioeconomic (i.e., population and GDP) data to estimate urban demand changes in different countries. Historical population and GDP data were obtained from the World Bank database (http:// databank.worldbank.org/). The country-specific urban area growth model can be developed from the population and GDP data combined with the urban extent time series. To project future urban demand change, we collected these two socioeconomic variables with changes under five SSPs until 2100 45 .
Historical urban extent (1992-2013) was derived from the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) stable nighttime light (NTL) data with a spatial resolution of 1 km 17,61 . The urban extent was defined as the region that has the largest change of NTL luminance along the urban-rural gradient, and this is consistent with high-resolution built-up area. It is worth to note this definition of urban extent is not same as that in census mainly based on urban population, in which the criteria to define urban areas could vary significantly across regions. Here, we used the NTL-derived urban extent time series data with an annual interval because other global urban extent data with finer spatial resolutions, such as the built-up maps of Global Human Settlement Layer (GHSL) 60 , are generally only available at coarse temporal resolutions. The NTLderived global urban extent data are spatially and temporally consistent by using the same approach globally 17,62 , and are reliable according to the evaluation using finer spatial resolution land cover data. The urban boundaries derived from NTL also agree well with the finer resolution built-up map of GHSL ( Supplementary  Fig. 8). It worthy to note that some small settlements with size less than 5 km 2 were not included in the NTL-derived urban extents (Supplementary Fig. 8). Indeed, the impact of removal smaller urban clusters is tiny (Supplementary Table 1), because these omitted small towns and settlements in NTL-derived urban extent data have relatively low growth rates. More details of the global urban extent data can be found in Zhou et al. 17 .
We built a global dataset of spatially explicit proxies to evaluate the suitability of urban sprawl at the pixel level. These spatial proxies include terrain, land, and urban infrastructures globally consistent and widely used in other studies 59 (Table 1). For those proxies such as major cities and traffic, we calculated each pixel's Euclidean distance to the nearest cities and roads. Given that these spatial proxies have different units and a wide range of magnitudes, we normalized them from 0-1.
Urban demands. We estimated country-specific urban demands before 1992 and after 2013 under five SSPs using different methods. For urban demand before 1992, given that there are limited socioeconomic data at the country level (e.g., population and GDP) before the 1990s, we used the temporal trend of urban area growth (1870-1992) from the History Database of the Global Environment (HYDE) database 63 . The HYDE data span from 10,000 BC to AD 2000, of which the built-up areas were estimated from demographic drivers (e.g., total and urban population) based on survey data of cities from literatures 31 . Because of the difference between urban areas from satellite observations and HYDE, a harmonized strategy using a ratio calculated in 1992 was implemented (Eq. (1)). Historical urban areas during the period 1870-1992 were estimated by multiplying urban area in the HYDE by a ratio.
where i indicates country, NTL 1992 i and HYDE 1992 i are derived urban areas in country i from NTL observations and HYDE, respectively.
For urban demand after 2013, we estimated country-specific urban demand using the urban area growth model 44 . Urban area growth in the future was determined by its historical pathways and the growth of future population and GDP. Using more than 20-year time series data of urban extent 17 and the records of population and GDP change from the World Bank database, we developed the urban area growth model for each country 44 . Thus, we projected urban area growth of global countries under different SSP narratives. Details of these SSPs and the resulting country-specific urban areas can be found in Li et al. 44 .
Logistic-Trend-CA model. We calibrated the Logistic-Trend-CA model using the global urban extent time series data from 1992 to 2000. The Logistic-Trend-CA model is an improved CA model that considers the temporal effect of urbanized pixels on the spatial expansion of urban land 37 . That is, there is a relatively higher probability of urban development for pixels that were surrounded by more recently developed urban pixels. This improved model can notably reduce errors generated and propagated during the modeling process 64 . Hence, we used the Logistic-Trend-CA model to simulate the urban sprawl in this study.
The Logistic-Trend-CA model was built upon the widely used framework of the urban CA model, which has been extensively used in urban growth simulations 65 . In general, there are three components in the urban CA model, including transition rules, neighborhood, and land constraint 21 . These components represent different influential factors during urban sprawl, resulting in the development probability P dev of conversion from non-urban to urban (Eq. (2)). We iteratively allocated derived urban areas to spatially explicit grids based on the development probability P dev within a given region and period.
where P dev is the development probability; P suit , Ω, and Land are three components representing suitability surface, neighborhood, and land constraint, respectively. We derived the suitability surface (i.e., transition rules) via calibrating the Logistic-Trend-CA model for each country using the global urban extent time series. The suitability surface represents the probability of urban development in an area with consideration of the socioeconomic and biophysical status (e.g., infrastructures, land surface, and terrain). First, we determined urbanized and persistent regions between 1992 and 2000. Then, we randomly generated training samples in urbanized and persistent areas with a sample rate of 20%. Next, we extracted spatial proxies of these samples and generated the suitability surface using the Logistic Regression (LR) model (Eqs. (3) and (4)), where P suit is the derived suitability of urban development, b 0 is the intercept, b i and x i are the i th coefficient and spatial proxy (Table 1), respectively. The value of b i can be referred to as the contribution of each spatial proxy to the urban sprawl. We improved the neighborhood component in the CA model by adding a weight factor to represent the trend of urban sprawl. The neighborhood represents the impact of surrounding neighbors on the central pixel, which is likely to be urbanized with urbanized pixels surrounding it. Here, we used the neighborhood that considers the urbanized year of neighbors, based on the widely used Moore configuration 66 (Eqs. (5) and (6)).
where Ω denotes the influence of neighborhood with the consideration of the trend of urban sprawl using a weight factor of W ts ij . N u ij is the accumulated year of cell (i, j) with the status as urban from the annual urban time series data with a temporal period of N. Thus, pixels that were urbanized more recently have relatively larger weights in calculating the neighborhood density Ω. m is the window size (set as 3), and Con() is a conditional function and returns 1 when the status of cell (i, j) is urban. In addition, water and protected areas were regarded as land constraints and are not allowed for conversion to urban in the Logistic-Trend-CA model 21 .
Hindcast and projection. We reconstructed global urban extent back to the 1870s using the calibrated Logistic-Trend-CA model. Different from projection in which the newly developed urban areas expand from the urban core to fringe areas, the hindcast excludes pixels with lower development probabilities from the initial urban areas in 1992 ( Supplementary Fig. 9). First, original urban pixels in the period T0 were ranked based on their developed probabilities. Then, urban pixels with relatively lower development probabilities during the period (T0 to T-1) were eroded from urban extent in the period T0. This process was iteratively implemented from 1992 to 1870.
We projected future global urban sprawl from 2013 to 2100 under five SSP scenarios by using the calibrated Logistic-Trend-CA model. The differences of SSPs are mainly reflected in the projected urban demands, which would be spatially allocated in a spatially explicit manner. Within each country, we used the Logistic-Trend-CA model to allocate increased urban demand to each 1-km grid. Finally, we combined hindcasting and projecting urban sprawl results and generated the long-term global urban extent product from 1870 to 2100.
Performance of the Logistic-Trend-CA model. The performance of the calibrated model was evaluated from two aspects. First, we assessed the suitability surface calibrated from the time series of urban extent data during 1992-2000 using the Receiver Operating Characteristic (ROC) approach 59 . The ROC approach evaluates the performance of the LR model by setting different thresholds over the whole domain of predicted probabilities, forming a continuous curve. The area under the curve (AUC) is commonly used as a quantitative indicator for assessment, and a higher value of AUC indicates better performance of the derived suitability surface. Second, we evaluated the simulated urban extent during 2000-2013 using two metrics: overall accuracy (OA) and figure of merit (FOM). The OA depicts the overall agreement of the modeled and observed urban maps, and the FOM characterizes the agreement between modeled and observed maps on changed pixels relative to the initial year of 2000 ( Supplementary Fig. 10) (Eq. (7)) 67 . The FOM (also called the Jaccard index or the Intersection-over-Union indicator in computer sciences) can provide a relatively comprehensive evaluation of model performance, and it has been widely used for model assessment in urban sprawl modeling.
where FOM is the figure of merit, B is the number of pixels that were observed and simulated as urban; A is the number of pixels that were observed as urban but simulated as non-urban; C is the number of pixels that were observed as non-urban but simulated as urban.
The Logistic-Trend-CA model performs well in simulating urban sprawl in most countries ( Supplementary Fig. 11). Overall, the mean of AUC for all global countries is 0.89, suggesting a good performance of the calibrated logistic regression model (Supplementary Fig. 11a) 68 . That is, the generated suitability surface can adequately characterize the difference of urbanized and persistent regions from these spatial proxies. AUCs for countries in Asia and Africa are higher than in other regions, indicating the calibrated model can capture the rapid growth of urban sprawl in developing countries. Besides, the dominant factor of urban sprawl (i.e., the spatial proxy with the largest weight derived from the LR model) varies among countries. Infrastructures (e.g., the distance to city centers, highways, and major roads) are the dominant factors for most countries in the world (Supplementary Fig. 11b). Variations of the dominant spatial proxies across countries reflect different sprawl patterns of urban areas. Such patterns are essentially related to factors such as development levels and urban planning 47 .
The Logistic-Trend-CA model distinctively outperforms the traditional Logistic-CA model ( Supplementary Fig. 12) according to the indicator of FOM. The mean FOM in the Logistic-Trend-CA model is 43%, which is about 10% increase compared to the Logistic-CA model. All settings in the Logistic-Trend-CA and Logistic-CA models are the same, except for the neighborhood, where the Logistic-Trend-CA model considers the temporal effect of urbanized neighbors. Comparison of these two urban CA models suggests the impact of urbanized neighbors at different years on urban sprawl is noticeable, especially for urban sprawl spanning a long period 37 . For example, we examined the trend of accuracy measures (i.e., OA and FOM) from 2000-2013 in three representative countries (i.e., the US, China, and India) (Supplementary Fig. 13). In each year within the validation period (2001-2013), we compared the FOM with urban extent in 2000. Overall, we observed an opposite trend of OA and FOM with the increase of modeling years.
That is, the OA shows a consistent decreasing trend while the FOM is increasing during the modeling period. This phenomenon was caused by error generation and propagation of the urban CA model 69 . The decrease of OA was caused by the error accumulation during the modeling, while the increase of FOM suggests the neighborhood plays a crucial role in selecting those urbanized pixels around the initial urban extent. In addition, we observed that the FOM derived from the Logistic-Trend-CA model is higher than that from the Logistic-CA model, particularly for years after 2010. This is mainly driven by the significantly increased urban areas in recent years, i.e., the mean growth rate during 2010-2013 in China (13,002 km 2 /y) is about 2.3 times faster than that during 2000-2010 (5643 km 2 /y). In addition, it is worth to note that it is difficult to quantify the model accuracy for the far-range prediction and hindcasting results due to the availability of observations. Instead, it is more useful to explore the diverse urban sprawl scenarios in the future, mainly driven by the unique trend of urban area growth in different regions and socioeconomic and climate pathways 26,42 .

Data availability
Historical country-specific urban area data before 1992 were obtained from the History Database of the Global Environment (HYDE) database 63 (https://www.pbl.nl/en/image/ hyde). The projected country-specific urban area data after 2013 under five Shared Socioeconomic Pathways were available at the Figshare repository (https://doi.org/ 10.6084/m9.figshare.7817624.v1) 44 . The generated long-term global urban extent dataset includes the hindcasted urban extent (1870-1990); satellite-derived urban extent (1992-2013); and the projected urban extent (2020-2100) under the five SSPs. The spatial resolution of this dataset is 30 arc-second (~1000 m at the equator). The uploaded data are in GEOTIFF file format at the Figshare repository (https://doi.org/10.6084/ m9.figshare.9696218) 70 .  72 Infrastructure Major cities World cities -city center 73 Traffic World roads -major road 74