Prediction of transportation index for urban patterns in small and medium-sized Indian cities using hybrid RidgeGAN model

The rapid urbanization trend in most developing countries including India is creating a plethora of civic concerns such as loss of green space, degradation of environmental health, scarcity of clean water, rise in air pollution, and exacerbated traffic congestion resulting in significant delays in vehicular transportation. To address the intricate nature of transportation issues, many researchers and planners have analyzed the complexities of urban and regional road systems using transportation models by employing transportation indices such as road length, network density, accessibility, and connectivity metrics. This study addresses the complexities of predicting road network density for small and medium-sized Indian cities that come under the Integrated Development of Small and Medium Towns (IDSMT) project at a national level. A hybrid framework based on Kernel Ridge Regression (KRR) and the CityGAN model is introduced to predict network density using spatial indicators of human settlements. The major goal of this study is to generate hyper-realistic urban patterns of small and medium-sized Indian cities using an unsupervised CityGAN model and to study the causal relationship between human settlement indices (HSIs) and transportation index (network density) using supervised KRR for the real cities. The synthetic urban universes mimic Indian urban patterns and evaluating their landscape structures through the settlement indices can aid in comprehending urban landscape, thereby enhancing sustainable urban planning. We analyzed 503 real cities to find the actual relationship between the urban settlements and their road density. The nonlinear KRR model may help urban planners in deriving the network density for GAN-generated futuristic urban patterns through the settlement indicators. The proposed hybrid process, termed as RidgeGAN model, can gauge the sustainability of urban sprawl tied to infrastructure and transportation systems in sprawling cities. Analysis results clearly demonstrate the utility of RidgeGAN in predicting network density for different kinds of human settlements, particularly for small and medium Indian cities. By predicting future urban patterns, this study can help in the creation of more livable and sustainable areas, particularly by improving transportation infrastructure in developing cities.


Introduction
Mapping urban land use dynamics has been valuable research in urban studies over several decades.The advancement in remote sensing technology makes it possible to track the spatiotemporal changes in urban landscape structures with relatively high accuracy and on a required scale 1 .The spatial distribution of land use activities (residential, commercial, industrial, etc.) including the transportation system is important for understanding the current urban centers and for planning future city development 2 .Land use and land cover maps become a starting point for modeling urban patterns and they can infer the future urban growth and the direction of land expansion of cities to inform urban planners and government policymakers towards sustainable urban planning [3][4][5] .While urban areas continue to experience rapid growth, they pose new challenges to the nation, especially for developing and underdeveloped countries 6 .Hence, urban growth prediction models and related studies have become a hot topic that has been extensively and deeply investigated.The existing urban growth models [7][8][9][10] include the driving factors which affect urban expansion.These factors influencing urban expansion include population growth, economic development, urbanization, transportation infrastructure, topography, and land use regulations 11 .However, in developing and underdeveloped nations, where urban expansion is more likely to occur, data on driving forces are hard to obtain and are often expensive to collect.
According to the UN report 12 , India is the most populous country in the world.From 1901 to 2011, the country's urban population expanded by around 14 times 13 .Although largely unequal, its increase is not skewed and is not limited to one arXiv:2306.05951v1[cs.LG] 9 Jun 2023 region across the nation.The skyrocketing living costs in metropolitan areas and increasing house rents discourage enterprises from investing in major cities.Therefore, it is essential to assess the settlement pattern and infrastructure facilities of small and medium towns as an alternate option to larger metropolitan cities 14 .These towns sometimes called the "next billion" markets, will be important in propelling the expansion of the national economy in India 15 .Further to the intricacy of urban settlement, more basic infrastructure and facilities are required for these regions.
Various reports in recent years have estimated a massive demand for funding urban infrastructures in developing countries.The World Bank estimates that nearly 70000 Billion (INR) of investment in urban India will be required to meet the growing population demands in the next 15 years until 2036 (in 2020 prices) 16 .For example, the Indian Government introduced a scheme called the Integrated Development of Small and Medium Towns (IDSMT) project that aims to encourage the planned and sustainable growth of the nation's small and medium-sized towns or cities 15 .The Ministry of Urban Development, Government of India introduced this scheme in 2005, and it offers financial and technical assistance to local governments in order to help them develop their towns' infrastructure and fundamental services.This scheme focuses on the expansion of small and medium-sized towns (the population size is up to 500K), which may act as growth hubs for the nearby rural regions, in order to promote inclusive growth and balanced regional development.By providing funding for the construction of fundamental utilities including water supply, sanitation, solid waste management, and urban transportation, the program seeks to solve the infrastructure deficit and service shortages in these communities.The overall goal of the IDSMT program is to support sustainable urban growth and raise the standard of living in India's small and medium-sized towns.As a result, it is important to examine the small and medium towns (Tier 3 and above cities with populations up to 500K) in India.Adopting cutting-edge technologies can have a significant impact on enhancing Government's effectiveness in improving planning and decision-making, problem-solving, accelerating development, and deployment 17 .
To mitigate this urban planning challenge, recent developments in machine learning and deep learning have become handy tools for urban planners and geoscience practitioners.Deep learning models such as Generative Adversarial Networks (GAN) can approximate complex, high-dimensional probability distributions 18 .GANs have achieved numerous state-of-the-art breakthroughs in the fields of computer vision 19 , natural language processing 20 , and more recently in urban science and geospatial domain 21,22 .Several GAN-based models have been proposed to simulate hyper-realistic urban land use maps and generate synthetic urban universe without considering the driving factors, see for example CityGAN 23 and MetroGAN 24 .Among these, CityGAN 23 simulates urban patterns using global urban land-use inventory and builds an "urban universe" to reproduce the complex spatial patterns observed in global cities.An extension to CityGAN by incorporating geographical knowledge is called Metropolitan GAN (MetroGAN) 24 which learns hierarchical features for urban morphology simulation.Another deep learning method, namely U-Net 25 , is also applied to generate future urban cities using water bodies, digital elevation models, and nighttime lights as inputs 22 .The application of previous GAN-based urban models was limited to the generation of urban patterns.There are limited works on quantifying the urban pattern of GAN-generated images and predicting the transportation metric (representing urban infrastructure) for these new urban regions.Thus, quantification and modeling of landscape patterns of those GAN-simulated cities remain an unattempted problem.The structure of a landscape emerges from the characteristics of the individual elements of an ecosystem and their spatial configuration [26][27][28][29] .Human Settlement Indices (HSI), e.g., Class Area (CA), Number of Patches (NP), Largest Patch Index (LPI), Clumpiness Index (CLUMPY), Aggregation Index (AI), and Normalized Landscape Shape Index (NLSI) [30][31][32] provide some concrete information about the landscape structures and therefore contributes in the prediction of Transportation Index (TI).This paper makes an attempt to answer the following challenging questions: (a) How to generate an urban universe for India based on spatial patterns via learning urban morphology?(b) Is there a relationship between HSI and TI in small and medium cities in India?(c) How to predict (forecast) TI for synthetic urban cities generated by CityGAN for developing countries like India?
To generate small and medium-sized Indian cities with CityGAN, we collected the World Settlement Footprints (WSF 2019) maps which are publicly available and the best representations of urban patterns as input features 33,34 .Then, we build a city image database of 503 small and medium Indian cities whose populations range between 20K and 500K.Each city image represents 10.5 × 10.5 km covering the urban center and surrounding regions.We also explored existing spatial and statistical measures to evaluate the performance of CityGAN in Indian cities due to the complex nature of individual cities with varying structural and hierarchical properties 23 .Assessing the spatial relationship between urban patterns (human settlement) and the transportation index of actual cities can help to build a model for predicting the transportation index for generated cities.We used different linear and nonlinear measures of statistical correlations to establish this spatial relationship.Furthermore, we propose a hybrid model (namely, RidgeGAN) to predict the transportation index for simulated urban patterns.Fig. 2 depicts the methodological framework.In RidgeGAN, a supervised learning model (KRR) builds a relationship between the human settlement patterns and the characteristics of the urban road transportation system and implements this to predict TI for the GAN-simulated urban universe for India.Our proposal has numerous applications, ranging from understanding urban land patterns to predicting relevant urban infrastructure facilities to guiding policymakers toward a better and more inclusive planning process.

Applications of GANs in geospatial field
Deep learning has reached a significant milestone in geospatial research, computer vision and other cutting-edge technologies 35 .GAN 18 , an essential subfield of unsupervised deep learning, has opened a new vista for geoscience research in recent years.GANs are utilized to generate data that is close to a given training set which can be images, texts and tabular data 36 .Geoscientists and urban planners have adopted this new deep learning methodology for handling geophysical and remote sensing data.In remote sensing, MARTA GANs were proposed for producing fake satellite images of urban environments 37 .It consists of a discriminator network that receives both real and synthetic images as input and predicts whether each image is genuine or synthetic, as well as a generation network that uses random noise as input to create synthetic images.Further, Spatial Generative Adversarial Networks (SpaGANs) 38 were introduced for synthesizing textures by incorporating spatial information (such as the position and orientation of the texture) into the generator and discriminator networks.A comprehensive review of GANs demonstrates promising performance in the built environment, from processing large-scale urban mobility data and remote sensing images at the regional level to performance analysis and design generation at the building level 22 .
In addition, GANs were also applied to modeling global urban patterns.For example, CityGANs 23 , conditional GANs 39 , and MetroGANs 24 was built to generate urban land patterns by training the generator networks to generate synthetic images of urban areas that closely resemble real urban areas and it will be useful for analyzing urban human settlement data from space-based sensors.For more accurate urbanization parameters or spatial indices estimates in locations where local data is unavailable or impossible to collect, these models are very effective in simulating urban land use patterns.GANs are used to model hyper-realistic settlement patterns since they do not make any assumptions about the data distribution and can generate real-like samples from the latent space in a straightforward manner.This unique property lends GANs to a variety of geospatial applications, including image synthesis, image attribute editing, image translation, domain adaptation, and other computing fields 40 .

Transportation and urban landscape structures
The spatial structure of a city is extremely complex and is constantly evolving.Therefore, there are significant attempts to analyze cities, and thus to link urban policy to shape cities. Delineating homogeneous/heterogeneous human settlements, quantifying them, and analyzing their diversity and spatial organization are necessary to assess their structures and spatial patterns.Due to this, urban researchers utilized landscape metrics to quantify the qualities of the landscape related to shape, pattern, and area by measuring the structure and spatial distribution of settlements.Landscape metrics were originally introduced in ecological studies that reflected social, cultural, and ecological richness and heterogeneity 41 .Also, progressive and well-functioning urban planning departments can use spatial indicators to regularly monitor urban development and, when necessary, propose regulatory or public investment action 42 .These indicators can also evaluate the geometrical characteristics of ecological processes and landscape elements, as well as their relative locations and distribution 43 .The effect of landscape metrics on spatial patterns was studied to quantify landscape structures and these metrics can statistically determine the outcome 31 .Spatial patterns of urban growth and landscape metrics were studied for various cities in India 30,44 .The integration of efficient urban structures and comprehensive transportation metrics plays a vital role in fostering sustainable development and improving the overall livability of cities.
Understanding the interaction between transportation infrastructures and urban pattern areas is always critical for driving smooth urban services.The investigation into the development of mathematical models for studying the relationship between transportation metrics and urban land use began in the early 1960s and technological advancements brought us to an era of integrated land use transportation modeling 45 .Several road network models have been developed to solve transportation problems.Most of the existing transportation models for prediction now in use are based on simple linear regression models.An inverse relationship between urban growth and transportation was found for the middle east regions 46 .Their analysis suggested that urban population growth has increased urban trips and increased travel demand due to transportation infrastructure.
In a recent study, the link between population and the characteristics of the road network in the Lebanese Republic was investigated using a multivariate regression model to estimate the population count based on various data sources and statistical modeling techniques 47 .However, linear regression models have the drawback of excluding all variables which are not linearly connected, and multicollinear variables adversely affect the model.Besides, it is proved that the relationship between transportation features and their influencing factors is not always linear in nature.Predicting the transportation index is an important step towards minimizing traffic congestion and providing critical information to individual travelers as well as various Government sectors to plan the city in a sustainable way.A support vector regression (SVR) approach was used to predict traffic flow from California highways using different types of kernels 48 .A road network density (one of the transportation indices) prediction model was proposed using highway capacity, and turning probabilities manual methods were used to determine the shortest cycling time in metropolitan areas 49 .The model could aid in vehicle distribution and congestion relief in urban areas.Further, the concept of graph theory was used to analyze the topology of road networks in an Indian city to better understand the connectivity and coverage of the existing road transportation system 50 .Their findings indicate that there is a strong relationship between road connectivity and coverage and that improving the road network is essential for a reliable and safe road transportation system.To accurately estimate the connectivity index, the paper 50 proposed a model based on the relationship between the Eta index and Network Density (ND), Edge Graph Density (EGD) and Nodal Graph Density (NGD).

Results
The scholarly literature on urban challenges primarily focuses on megacities and large urban centers, but there are a great number of small and medium-sized cities in developing countries that should be prioritized.There is a pressing need to address the challenges of developing transportation networks and settlement patterns in these cities.Previous literature on urban studies does not adequately address the transportation, unplanned city growth and socioeconomic and environmental challenges of small and medium-sized towns 51 .In this study, we utilize global training WSF data across India, and we demonstrate a simple and unconstrained GAN model to generate realistic settlement patterns that encompass the diversity of urban forms.We are primarily interested in how small and medium-sized cities are simulated using unsupervised CityGANs.Subsequently, an effective data-driven hybrid model is developed to predict road network density for a given urban settlement pattern.

Study area
Our study focuses on small and medium-sized Indian cities (South Asia), one of the world's fastest-urbanizing regions.The selection of the study area involved the identification of the geographical location and corresponding demographic data for which the population data from the World Cities database ( https://simplemaps.com/data/world-cities) were used.Approximately, 503 cities out of 1600 were selected for our study where the population size ranges between 20k to 500k.The geographical location of study areas marked in the map of India is depicted in Fig. 1

4/16
Data collection and preprocessing Settlement footprints and transportation network datasets of all the selected cities were collated from various sources over 2019.Settlement maps were procured from the global inventory built-up database called WSF, published by the German Aerospace Center (DLR).These are binary maps (urban area pixels have a value of 1 and non-urban area pixels have a value of 0) derived from multi-temporal space-borne satellites namely Sentinel-1 and Sentinel-2 data that aided in estimating the human settlement pattern of an urban area (10.5 × 10.5 km) with a resolution of around 10m/px.Open source GIS software (QGIS) was used to pre-process (rectify, project, and crop) the images and build a city database.To measure Human Settlement Indices (HSI), existing landscape metrics were selected and computed using Fragstat software 52 .It includes the packages to compute popular landscape metrics using spatial pattern analysis.We use six settlement indices: Total class area (CA), number of patches (NP), largest patch index (LPI), clumpiness index (CLUMPY), aggregation index (AI) and normalized landscape shape index (NLSI) to estimate the characteristics of human settlement 30,31 .
CA is a useful metric to depict the spatial extent of the settlements.It is a composition that specifies the extent of landscape that is made up of a specific class type (e.g., built-up area).The total class area, which is the sum of the areas (m 2 ) of all the patches of the relevant patch type divided by 10,000 (converted to hectares) and CA > 0 indefinitely.NP in each landscape indicates the degree of fragmentation that counts the number of human settlements or urban patches.The higher the value of NP, the higher is the fragmentation with no limit.At the class level, LPI estimates the percentage of the total landscape area occupied by the largest patch as an indicator of dominance.LPI is calculated by dividing the area (m 2 ) of the largest patch of the relevant patch type by the entire landscape area (m 2 ) and multiplying the result by 100 (converted to a percentage), i.e., LPI is the percentage of the landscape comprised by the largest patch.LPI values (0 < LPI < 100) decrease from the city center to the outskirts.Another human settlement metric CLUMPY deals with aggregation and disaggregation for adjacent settlements.It shows the frequency with which different pairs of patch types appear side by side.The value ranges from -1 to 1; -1 indicates maximally disaggregated patch type, 0 when the patch type is randomly distributed, and 1 when the patch type is maximally aggregated 52 .Another metric AI is calculated using an adjacency matrix, which indicates how frequently distinct pairs of patch types (including adjacencies between the same patch type) appear on the map side by side in the settlement map.Its values range from 0 to 100; AI values are less indicating maximum disaggregation and the high AI shows the maximally aggregated single and compact patch.Finally, NLSI provides a measure of class aggregation for which the values ranges from 0 to 1, where 0 means the landscape consists of a single square or maximally compact (i.e., almost square) patch.NLSI increases as the patch type becomes increasingly disaggregated and reaches 1 when the patch type is maximally disaggregated 30 .
To compute the Transportation Index (TI), a layer of the road network that has been topologically cleaned and converted into polylines is a prerequisite.The application software used here is integrated with QGIS 3.30 for this purpose.Individual cities and their corresponding road networks were extracted for assessing the spatial patterns of the road systems corresponding to the respective cities.As transportation measures need to be calculated in a metric system, a projected coordinate system was used.Instead of common WGS84 -EPSG:4326, which uses degrees as a unit for distance, the Coordinate Reference System WGS84/UTM-EPSG was used here, to measure road length in meters.All categories of roads such as National highways, State highways, major roads, street roads, residential paths, footways, and service roads were included in this study.To measure the development of the urban road network, the network density of the respective cities was computed as follows: Network Density (ND) = L A = Total length of the road network City area ; where L is determined from road maps and it has been calculated using an open field calculator in QGIS software.Network length specifies the total span of the road network and network density is measured according to the area occupied by road networks (city area), denoted as A. Fig. 2 shows the overall workflow of the proposed hybrid framework used in this study to predict the transportation index for any kind of urban pattern in Indian cities.

Validation metrics
In the subsequent section, we estimate the Average Radial Profile (ARP) for real and generated urban patterns to assess the accuracy of CityGAN quantitatively.An example of computing the ARP of a city is illustrated in Fig. 7 and using a peak search algorithm, we determined the polycentricity of actual and simulated scenes from the radial profiles.ARP (h(d) or x(d)) represents how much the human settlement area changes as we go out from the city center.As indicated in Fig. 7 (b), we draw rings at a distance of d from the center and a width ∆ of d.By averaging the entire settlement area inside rings of width ∆d at a distance d from the city center, we can compute ARP (h(d)).The region inside the ring with radius d is denoted as R(d) and each pixel inside the ring has some build-up area, that is the amount of urbanized area denoted as H(u, v), where (u, v) is a point inside the ring (1) where R(d) can be defined as the collection of all the two-dimensional points included within the ring, |R(d)| indicates the size of set R(d), and h(d) is the average radial profile of a city.
There are several statistical measures that are used to assess the supervised regression models, for e.g., mean squared error (MSE), mean absolute error (MAE), R-squared (R 2 ), and Adjusted R-squared (Adj R 2 ).MSE is the average squared difference between the predicted and actual values.It is a widely used metric that measures the quality of a regression model whereas MAE is the average absolute difference between the predicted and actual values.It is a robust metric that is less sensitive to outliers than MSE.R-squared measures the proportion of variance in the target variable that is explained by the model.It ranges from 0 to 1, with higher values indicating a higher correlation.Adjusted R-squared is a modified version of R-squared that takes into account the number of predictors in the model.It penalizes the model for adding unnecessary variables and is a better measure of a model's goodness of fit 53 .

Generating human settlement area from WSF 2019
A dataset of real-time human settlement images was collected and pre-processed before training the GAN.We utilized squared shape settlement maps of cities from the WSF 2019 database.We clipped them representing 10.5km × 10.5km spatial extent and resized each image to 256× 256 (43m/px) for optimization purposes and to avoid overfitting.The final input dataset contains 503 binary images and can be formulated as H i = h 1 , . . .., h n , with H ∈ R W XW and W = 256 and n = 503.The source of H i is an urban binary map (1 and 0 represent urban and non-urban areas respectively).The generator network is trained to generate synthetic urban settlement images by generating random noise and transforming it into an urban image whose distribution matches the real urban images.The discriminator network is trained to distinguish between real and synthetic binary images.Here, generator G takes a random noise vector Z noise as input, which deterministically changes (e.g., by passing it through successive deconvolutional layers if G is a deep CNN) to generate a sample fake human settlement image H f ake = G(z).Then the discriminator (D) accepts an input image H (which can be an actual human settlement image (H real ) from an empirical dataset or generated image (H f ake ) synthetically by a generator and outputs the source probability that H is either sampled from the real distribution (H real ) or produced by generator (H f ake ).Having trained a generator (G) (refer Fig. 2 (a)), we generated synthetic Indian urban settlement patterns of 500 binary images using the CityGAN model.Fig. 3 illustrates randomly selected real cities (Fig. 3(a)) and simulated urban patterns (in Fig. 3(b)).On a visual inspection, simulations are practically indistinguishable from the actual urban patterns, with realistic densities and complexity of settlement patterns.Input images and generated images are exhibiting realistic concentration at the center and distribution of settlement in the surrounding regions.Various quantitative metrics as discussed earlier are used to evaluate the performance of the Indian CityGAN model 42 .Among various spatial statistical measures, ARP 23 is used to compare the real and simulated urban patterns.We utilize Eq. 2 to compile the polycentric nature of real and generated images via the peak search algorithm illustrated in Fig. 7.The peak search algorithm finds points in univariate profiles whose value (peak height) is a fraction of the maximum height h and at a distance from the previously identified peak of at least d.We set an acceptable value of h = 80% and d = 430m via the cross-validation method.Graphical representations of peak search outcomes are illustrated in Figs. 4 (a) and (b).
The distribution of the number of peaks for real and fake cities is compared (refer to Fig. 5(a)) and similarities between the distributions are found.Further, we cluster the radial profiles of real cities using K-means algorithm 54,55 and compare it to the typical profile of generated cities. Fig. 5 presents a summary of our findings from the CityGAN model.Our analysis suggests that K * = 10 (refer Fig. 4(c)) gives the optimal number of clusters for both actual and synthetic scenes using a straightforward fraction of the sum-of-squares argument 55 .In Fig. 5(c), the distribution of scenes by class is given and they have a similar   shape and are more comparable.But in Fig. 5(b), we find more disparities for classes 1 and 4 (monocentric), 6, 8, and 11 (sprawled patterns).These discrepancies may result from a sampling technique that would have favored the abundance of mono-centric urban patterns while the simulation was produced regardless of the location of the urban center.Experimental results show that using the WSF dataset, CityGAN generates precise urban patterns for Indian cities.

Relationship between human settlement and transportation network of real cities
HSIs and TI are computed for the selected cities (workflow is illustrated in Figs. 2 (b) and (c)).The outcomes of the analysis of human settlements using selected spatial indices are displayed in Table 1.Table 2 provides examples of the calculation of the transportation index (network density) and Table 3 shows descriptive statistics of network density.The result shows the spatial distributions of network density vary among cities.Once the metrics are derived, correlation coefficients (CC) are calculated to determine the relationship between the human settlement indices (CA, NP, CLUMPY, LPI, AI, and NLSI) and the transportation index (RL, ND).The heat maps of the correlations between transportation indices and human settlement indices are illustrated in Fig. 6.Table 3. Descriptive statistics of transportation metrics (namely road length) Fig. 6(a) demonstrates the value matrix of Pearson's correlation coefficient (PCC) as a measure of a linear relationship (ranges between −1 to 1).Its value reflects the strength of the link between metrics.Positive numbers demonstrate the beneficial influence of variables on each other, and negative values represent the negative connection between variables.Here, the correlation index is displayed by the color intensity as well.The correlation increases as the color bar rises, light yellow color denotes a lower correlation.As shown in Fig. 6(a), the PCC values of coverage measures RL and ND are highly related to each other; therefore we choose ND as the response variable of TI.Settlement metrics such as CLUMPY and NLSI have the minimum correlation with transportation metrics (light blue color), while CA demonstrates the maximum value of correlation coefficient with transportation variables.The correlation coefficients of RL and ND with CA are 0.7 and 0.80 respectively.According to PCC, the highest correlation exists between CA and RL, CA and ND.Hence, CA is an inevitable variable in our regression model.Nonlinear relationships between HSIs and TIs are explored using Chatterjee correlation coefficients (CCC) and reported in Fig. 6(b).In the case of CCC, values reflect that most variables have positive correlations and CA has the highest rank followed by NP and LPI.

Prediction of transportation index
A supervised machine learning approach (Kernel Ridge Regression) was implemented to predict road network density for given settlement patterns (refer to Fig. 2 (d)).To train and evaluate the prediction model, the raw dataset (database of 503 real cities) was divided into two parts: training (80%) and testing (20%).We compare the performance of various other supervised regression models such as support vector regression (SVR), decision tree (DT), gradient boosting (GB), multilayer perceptron regression (MLP), XGboost (XGB), linear regression (LR), random forest (RF) and simple ridge regression (RR) to select the best-fit model.To validate the models, four statistical scores: Mean Squared Error (MSE), Mean Absolute Error (MAE), R-Squared and Adjusted R-Squared were used.The results of our model and comparison with other models are summarized in Table 4. Experimental result shows that KRR outperforms all other eight state-of-the-art regression models, in terms of the highest R 2 and Adjusted R 2 and lowest error metrics (MSE and MAE values).Because of the ability to handle nonlinearity and

Discussion
India is now the most populous country in the world 12 with more than 30% of the population residing in the urban area.Government of India recently came up with a scheme called IDSMT 15 to improve urban planning and road networks for the development of small and medium-sized cities (population size up to 500k).The Ministry of Urban Development offers financial and technical assistance to local bodies in developing their region's infrastructure and basic services to promote inclusive growth and balanced regional development.One of the objectives of this scheme is to address the transportation infrastructural deficits and service shortfalls of the eligible cities.However, Central Government finds difficulty in making a correct decision to allocate funds for transportation infrastructure development.
This study proposes a hybrid model to predict road network density for small and medium-sized settlement patterns in India.We used the publicly available WSF datasets and CityGAN model to build an unsupervised model to simulate realistic Indian urban patterns.The average radial profile was used to compare real against simulated cities to validate the performance of CityGAN.Also, we used the K-means technique to cluster the radial profiles with the optimal number of clusters to be equal to 10 for both actual and synthetic scenes using a straightforward fraction of sum-of-squares reasoning.Landscape structures of these generated cities were measured in terms of human settlement indices using spatial landscape metrics.Then, various supervised machine learning models were implemented for predicting transportation indices from human settlement indices based on real city datasets.All the regression models were compared based on error measurement metrics.The performance of the Kernel Ridge Regression (KRR) model outperformed the benchmark regression methods.The transportation index estimated from the KRR model is compared with actual test data (from real imagery).KRR has comparatively less MAE of 1.39 and a higher R 2 value of 0.71.Thus, the proposed hybrid model can be used to predict the transportation index in terms of road network density for CityGAN-generated towns and cities.Our proposal can be treated as a versatile decision-support system for sustainable planning and development of new small and medium-sized towns and cities in terms of transportation infrastructure.However, the proposed method is useful to establish a relationship between coverage measures of transportation variables with HSIs, but the current work doesn't consider any connectivity measure.As a future scope of work, making interconnections between connectivity measures with HSIs will be worth exploring.

Methods
In this section, we introduce the components of the model used in the hybrid framework are demonstrated.First, we discuss various correlation measures that are used in this study.Further, our proposed two-step pipeline approach utilizes the popularly used GAN model (CityGAN) in this case and the KRR model, a nonlinear shrinkage method for regression modeling.After this, we go over the suggested RidgeGAN method.

Correlation analysis
Correlation coefficients (CC) are popular statistical measures used to determine the strength of a relationship between two or more variables (can be numerical or categorical).Pearson's correlation coefficient (PCC) is the most commonly used classical method of measuring linear associations, and its ease of use is advantageous 56 .However, their efficiency may be limited when dealing with non-normal, noisy, closed, or open data (even after applying log ratios to the data).Chatterjee Correlation Coefficient (CCC) 57 , a recently developed method based on cross-correlation between ranked increments, is a reliable alternative to traditional correlation methods.CCC can deal with data that contains outliers or has non-normal distributions and it does not make any assumptions about the data distribution 56 .A Python implementation of CCC can be done using the "TripleCpy" package in Python.We can define CCC mathematically as follows: Given a pair of random variables (X,Y ) and suppose the realizations X i 's and Y i 's have no tie.A rearrangement of the pairs as X (1) ,Y (1) , . . ., . Let r i be the rank of Y (i) then CCC is defined using the formula: where n is the number of observations.Once the relationships are assessed between TI and HSIs, selecting the "best" regression model is easier so that the model explains the variability in the response variables with possibly lower prediction error.However, the ridge regression (RR) method can directly handle multicollinearity structures in the data along with the instability of least square estimators 58 .A more effective nonlinear regularized regression technique in machine learning is kernel ridge regression (KRR).The creation of the ridge regression method addresses some of the shortcomings of the least square method (over-fitting and multicollinearity) 59 .One of their advantages is that the kernel implementation allows to handle nonlinearity of the data [60][61][62] .

CityGAN: Generative Adversarial Networks for modeling urban patterns
GAN is a powerful unsupervised deep learning model that learns representations of input data to fit high dimensional complex distributions 18 .GANs are revolutionary in that they can produce very high-quality (i.e., extremely realistic) samples compared to predecessor models at similar computational costs.In general, GAN is a system that consists of two neural networks competing against each other in a zero-sum game context 63 .The two neural network architectures are a generator (G) and a discriminator (D) and they can generate new data that conforms to learned patterns through both generative and adversarial processes.GANs demonstrate promising performance in modeling complex geospatial data having spatial dependence 22 .Whilst the core application of GANs has been computer vision and image processing 18 ; however, their use in geoscience has provided urban planners with novel ways of generating "new" samples that can easily outperform state-of-the-art geostatistical tools.In this study, we deploy CityGAN 23 to learn the urban patterns from real settlement images and generate hyper-realistic urban settlement images.G and D are both deep convolutional neural networks with weight vectors θ G and θ D .Back-propagation is used to learn these weights by alternatively reducing the following loss functions

11/16
Here, the generator is made up of numerous convolutional blocks, including inverse-convolutional, batch normalization, and rectified linear unit (ReLU) layers, and ends with a hyperbolic tangent layer (which applies tanh (•) nonlinearity to each element of the produced map).Recent modifications of GANs have allowed performing conditional generation as domain transformation.In the GAN training phase, it is worth noting that the generator network is usually able to create realistic samples whereas the discriminator is an auxiliary network that gets discarded after training.Once GAN is trained, the CityGAN 23 can be used to generate new synthetic urban images that can be used for a variety of applications such as urban planning, disaster response, and simulations.Iteratively optimizing the G and D networks is part of the training process.The generator network attempts to deceive the discriminator by generating images that resemble real urban images, whereas the discriminator network attempts to correctly classify whether an image is real or fake.The networks are updated based on classification and generation errors until the generator produces images that are indistinguishable from real-world human settlement scenes.H f ake is implicitly sampled from the data distribution that the generator tries to imitate when G is at its optimum.However, the GAN-generated images may not be representative of all possible urbanization patterns because they are based on the training dataset and the GAN architecture used.As a result, before using GAN-generated images for any practical application, it is critical to carefully evaluate them and compare them to real-world urban areas.

Kernel Ridge Regression (KRR)
Regression modeling is a fundamental area of machine learning where the target variable is quantitative (real numbers) in nature.The most classical approach is linear regression using the ordinary least square method.However, it has salient disadvantages, e.g., overfitting and multicollinearity which can be addressed via ridge regression.Ridge regression "shrinks" the least square coefficients using regularization parameter via minimizing the objective function given below: where X ∈ R N×D is the feature matrix with N being the number of training samples, D is the number of features Y ∈ R 1 is the real-valued target vector, β is the regression coefficients, and λ ≥ 0 is the regularizer that helps in dealing multicollinearity problem.However, the Ridge regression model still has troubles when dealing with nonlinear data data 64 .A more general framework can be achieved by using a nonlinear mapping function φ (•) that maps low dimension feature to high dimension (helps in learning nonlinear patterns).Now, a kernel function in the form of the dot product is used to avoid the cause of dimensionality of the nonlinear transformation.Mathematically, Kernel between two points, say x m and x n is given by which satisfies Mercer's condition 65 .The major impact of Kernel is ridge regression that allows the identification of nonlinear functional relationships between one variable with remaining features.In this study, we use radial basis kernel function (RBF) 62 which is defined by: where γ is the width of the kernel.Predictions in KRR model for a new test input x * is given by, We use KRR to establish the relationship between HSIs and TI as shown in Fig. 2 (d).

Hybrid model: RidgeGAN
RidgeGAN is a hybrid approach based on unsupervised CityGAN and supervised KRR models.KRR 62 has a built-in mechanism to perform nonlinear regularization analysis in the presence of multicollinearity.CityGAN 23 became popular to generate fake city images (a.k.a possible future cities) that look like real cities from the visual inspection and are statistically significant via learning the urban morphology.After implementation, we evaluated the performance of CityGAN, by comparing the real and simulated cities using the most widely used spatial summary statistics in an urban analysis called average radial profile and peak search algorithm.Each city is represented as a 10.5 × 10.5 km image covering the urban center and surrounding regions.Although, the quantifying transportation index has a vital role in the development of sustainable city planning and management.Here, we built a supervised KRR model to predict the transportation index by learning the relationship between urban patterns and the road transportation index.The KRR prediction model is integrated with the CityGAN model to predict the transportation index of newly generated cities using CityGAN.To build our hybrid model, we mainly use two models: an unsupervised learning model for generating urban patterns and a supervised learning model to predict the transportation index.
To sum up, the workflow of the proposed RidgeGAN is detailed as follows (also see Fig. 2 for a schematic workflow): • First, we apply CityGAN, an unsupervised learning model to generate small and medium-sized Indian cities using the available urban morphological features.
• Landscape structures of real and generated cities are measured in terms of Human Settlement Indices (HSI) using spatial landscape metrics.
• We assess the relations between two important features of urban forms (human settlement and transportation system) and build a KRR model to predict the transportation index, namely network density.
• The proposed hybrid model framework can predict the road network density on a given urban pattern for the urban universes generated in the first step.
(b) showing visualization of human settlement patterns and corresponding transportation networks (also see Fig.1(a) and 1(c)).

Figure 1 .
Figure 1.(b) Geographical distribution of 503 small and medium-sized Indian cities included in the study (red colored square grids indicate the selected cities); (a) and (c) are examples of human settlement and transportation maps of two random cities, namely Calicut from the state of Kerala and Panihati from the state of West Bengal.

Figure 2 .
Figure 2. Prediction framework of the proposed hybrid RidgeGAN model: (a) Implementing an unsupervised learning model (CityGAN) to generate small and medium-sized Indian cities; (b) Landscape structures of generated cities are measured in terms of human settlement indices (HSI) using spatial landscape metrics; (c) Characteristics of the road network and landscape structures of real cities are measured in terms of HSI and transportation index (TI); (d) Assessing the relations between the settlement patterns and transportation system and building a supervised learning model to predict the transportation index for GAN-generated urban universe.

Figure 3 .
Figure 3.Comparison of real urban built land use maps (a) and synthetic maps (b) generated by a CityGAN.The pixel values in each case are in the range [0, 1], where 1 represents the portion of land occupied by buildings.Names of the cities are reported in (a) using yellow color text.

Figure 4 .
Figure 4. (a) Average radial profile map of Bhagalpur city; (b) Human settlement maps of two example cities (top row) along with their average radial profiles (bottom row), where the blue dots represent the peaks found by the peak-search algorithm; (c) Clustered radial profiles of real and synthetic settlement patterns for determining the number of clusters.

Figure 5 .
Figure 5. (a) Comparison of the distribution of the number of peaks of real and generated cities; (b) Comparison of the distribution of average radial profile classes of real and generated cities; (c) The typical radial profiles for real and generated Indian cities (similar profile)

Figure 6 .
Figure 6.Heat map of the correlation between transportation indices and human settlement indices, the (a) PCC, and (b) CCC based on the input data.

Figure 7 .
Figure 7. (a) Google satellite view of Bhagalpur city (randomly selected city to explain) in Bihar state (b) human settlement map with the method of computing their average radial profiles.

Table 1 .
Human settlement indices of small and medium-sized towns/cities in India calculated for the year 2019

Table 2 .
City number (n), the total length of the road network (L), City buffer area (A) and network density (ND)8/16

Table 4 .
Performance metrics for the test set of real dataset.Best model's values are highlighted in bold.multicollinearity difficulties within datasets, the KRR regression model performs best.Validation metrics indicate that our model is good at predicting network density for urban patterns.This implies that the supervised KRR model can be applied to predict TI for newly generated cities by CityGAN.