Semantic segmentation of PolSAR image data using advanced deep learning model

Urban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.

Synthetic Aperture Radar (SAR) is a type of active sensor that generates its energy which is transmitted in the form of electromagnetic waves and then receives a part of this energy after interaction with the earth's surface 1 . In SAR based remote sensing, wavelength plays a crucial role since it determines how the wave interacts with the earth's surface. The longer is the wavelength, the higher is the penetration depth 2 . Also, the SAR sensor collects signals in different polarizations. By analyzing these different polarization signals, one can identify the structure of the imaged surface with good accuracy based upon the dominant scattering-type -odd bounce, even bounce, and volume scattering. Target decomposition methods are used for classifying the PolSAR data into different scattering types 3 . The urban land cover mapping is becoming increasingly important since rapid urbanization has caused steady degradation of urban vegetation 4 . So, accurate classification of urban land cover is required for better urban planning and to avoid overstressing nature. While the decomposition methods are used to generate a false-color composite image based on the backscatter values, they often misclassify some urban areas as vegetation due to the diffused scattering type from oriented urban targets. This misclassification can be corrected with the help of semantic segmentation tasks using advanced machine learning algorithms. Semantic Segmentation involves assigning a class label to every pixel in the image 5,6 .
A comparative study by Gupta et al. 7 using SAR imagery between different classification algorithms like texture-based, SAR observables based, probability density function (pdf) based, and color based is done and then all of these are fused using the RF classifier. The results of this fusion based RF classifier outperform the www.nature.com/scientificreports/ individual classification algorithms and the overall accuracy obtained is 88%. Another comparative study by Camargo et al. 8 using SAR imagery for land-cover classification on Brazilian Savana Forest showed that machine learning algorithms SVM, RF, and Multi Layer Perceptron (MLP) give better results compared to Naïve Bayes and J48 with SVM clocking the highest overall accuracy. On contrary, a similar study done by Lapini et al. 9 for the Mediterranean Forest area showed that the RF classifier achieves higher accuracy compared to the SVM classifier reason could be the imbalanced number of samples among classes.
One of the core components in the image segmentation task is feature extraction. Texture descriptor matrices measure information about spatial positioning or intensity due to which they are extensively used to extract spatial features of the SAR image 10 . Texture analysis is generally divided into three categories: transform-domain analysis, statistical analysis, and model-based analysis 11 . In the transform domain analysis, wavelet transforms are used to analyze SAR images at different orientations. Because of its geometric and stochastic properties the Discrete Fourier Transform (DFT) allows for an efficient multiresolution representation of SAR images 12 and the Gabor transform is a variant of Fourier transform, defined as the Fourier transform multiplied by a Gaussian function 13 . The commonly used statistical analysis methods include Grey-Level Co-occurrence Matrix (GLCM) and Multilevel Local Pattern Histogram (MLPH). The GLCM computes image properties related to second-order statistics like energy, entropy, contrast, homogeneity, and correlation 14 . The MLPH outlines the size distributions of dark, bright, and homogenous patterns appearing in a moving window at various contrasts 15 . In the model-based analysis, the Markov Random Field (MRF) is widely used to analyze spatial information between neighboring pixels. To represent the spatial logic relationships, the conditional random field model is integrated with a multiscale region connection calculus model 16 . In the current work, we have used the Gabor filter, Median filter, Gaussian filter, and Canny edge detector to extract the SAR image features. These features combined with the RGB bands of the false-color composite image are used to train the machine learning models for the task of semantic segmentation.
Optical multispectral remote sensing has successfully demonstrated its potential in feature identification and land use/ land cover classification [17][18][19][20] . The major limitation of optical remote sensing is its dependency on the top surface color of an object. The portion of the electromagnetic spectrum which is used in optical multispectral remote sensing is the reflection of light in the visible range (0.4-0.7 μm) and reflected infrared (0.7-3 μm). This range for optical multispectral remote sensing is limited to the top surface color information of the objects and the short wavelength that is easily blocked by the atmospheric cloud. The active imaging microwave remote sensing is carried in the 1 mm to 100 cm wavelength range of the electromagnetic spectrum, and this range is sensitive to the structural and electrical properties of an object. Synthetic aperture Radar remote sensing has the advantages of scattering-based object characterization to identify structural and electrical properties and has several other advantages in data acquisition in the darkened environment and active imaging that can acquire data for SAR-based earth observation at night time too 21 . Polarimetric Synthetic Aperture Radar (PolSAR) is an advanced technique of SAR remote sensing in which fully polarimetric properties of the transmitted and received electromagnetic waves are used to characterize the scattering behaviour of different objects/scatterers within a SAR resolution cell 22 . SAR backscatter is a coherent sum of scattering contributed by all the scatterers within a resolution cell, and the information retrieval of different scatterers within a resolution cell is not possible with single or dual-polarised data. All the four possible polarimetric combinations of PolSAR data provide necessary and required scattering information through mathematical modelling of the interaction of electromagnetic waves with different types of objects. Several polarimetric decomposition models have been developed to provide at least three scattering elements within a resolution cell [23][24][25] .
The polarimetric feature is another important feature descriptor for PolSAR image segmentation. The polarimetric features are strongly dependent on scattering indexed values that are captured after a single bounce, double bounce, and volumetric scattering. The PolSAR decomposition is used to describe and classify the scattering from man-made, natural targets, and landscapes by splitting the received signal into a sum of different scattering mechanisms representing specific polarimetric signatures 23,26 . The KNN is one of the popular classifiers used for PolSAR image segmentation. The KNN is highly sensitive to the value of 'K' (number of neighbors). It is not efficient for a large volume of training data. A large number of polarimetric features can make it a lazy learner 27 . In Table 1, it can be observed that its training and inference time both are higher than all the other models. Overall, the KNN model is less sensitive for all the labels are shown in Fig. 1 and Fig. 2 in comparison to other models. The RF model is more robust to handle outliers and noise. The main disadvantage of the RF model is that a large number of trees and selection of other parameters can make the model too slow and may be ineffective for real-time predictions 8,28 . In Table 1, it can be observed that its inference time is more than SVM and Deeplabv3+ model. From Fig. 1 and Fig. 2, it can be observed that the RF model has good accuracies for urban and ground classification but less sensitive for water and forest. Support Vector Machine (SVM) algorithm is also a nonparametric classification model, which is insensitive to the distribution of the underlying dataset. The selection of suitable kernel, optimum kernel parameters, and the relatively complex mathematics of the SVM model restricts the effectiveness. The large training data size can also affect the performance of model 29 . From Fig. 1 and Fig. 2, it can be observed that the SVM model providing better segmentation for the water and urban class labels due to the kernel size and its sensitivity as compared to RF and KNN model. But it is less than the results of the DeepLabv3+ model.
The Deep Neural Network (DNN) models include deep Boltzmann machines 30 , Convolutional Neural Networks 31,32 , Deep Belief Networks (DBNs) 33 , and Stacked Auto Encoders (SAEs) 34 . It has been proved that DNN models can learn higher and more abstract level of features, which can amplify the discrimination and suppress irrelevant variations in the input data 35 . Therefore, the framework of DNN improves the state-of-theart in many computer vision tasks as well as in remote sensing applications 36,37 . Previously, Lv et al. 38 develop a DBN for land cover mapping in urban areas using the SAR data. After that, Liu et al. 39 designed a Wishart DBN to preliminarily classify and adopt local spatial information to adjust the classified results. Gong et al. 40   network Xception and atrous convolution are used in the last stage of the backbone to control the size of the feature map 40 . Here, Xception has been chosen as the backbone because it performs better compared to other architectures like RESNET-101 for the benchmark datasets like ImageNet 43 . Overall, the DeepLabv3+ model can retrieve complex textural patterns and polarimetric informative features from PolSAR image data. A detailed comparison of the models is illustrated in Table 1. One of the strong limitations of the DeepLabv3+ model is its high dependency on the amount of training data, i.e., ground truth. Although this work solved this limitation to some extent with the support of the transfer learning technique. After the improved performance of deep learning methods in the field of computer vision, researchers from the remote sensing community have made a great effort to implement land use land cover classification using deep learning. Land use land cover maps of an area helps users and remote sensing community to understand the current landscape. It enables the monitoring of temporal dynamics of forest conversions, agricultural ecosystem, surface water bodies, biomass estimation, helps to minimize the human and economic loss from floods, tsunamis, and other natural calamities. Since a large amount of data is captured by various satellite sensors on regular basis, it creates a repository of big data. To extract the information from this big data, and classify the data as per requirements, advanced deep learning models are required that must be computationally fast and cost-efficient. The objective of this work is the semantic segmentation of PolSAR data using a deep learning technique, and this work is a part of the NASA-ISRO Synthetic Aperture Radar (NISAR) mission's L and S-band Airborne SAR Research Announcement (RA) project on Implementation of Evolutionary Computing Algorithm for Polarimetric SAR Data Processing and Classification (TEC-07) 44 .
Conclusively, this study aims to create a new dataset for semantic segmentation of PolSAR images captured from Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) for Land Use Land Cover applications and effectively implement machine learning and advanced deep learning algorithms. This has been achieved using a small dataset with the help of transfer learning. Another aim of this study is to address the problem of oriented urban targets with the help of advanced machine learning techniques and DeepLabv3+ has addressed this issue quite well.

Results
The results of the semantic segmentation task for patch 1 and patch 2 are presented through Figs  www.nature.com/scientificreports/ accurate land cover maps compared to traditional machine learning algorithms. In Fig. 1, areas A and B corresponds to a river that falls under the water class but all the traditional algorithms have pointed a small ground patch in these two areas while the DeepLabv3+ has predicted it as water class correctly. Further, area C contains a small forest as clear from the decomposed SAR image, but it has been very sparsely captured by RF and KNN algorithms while it is almost missed by the SVM algorithm as depicted in (Fig. 1c-e). However, the forest has been correctly captured by DeepLabv3+ which is visible in Fig. 1f. It is shown in Fig. 2, area D and area E correspond to oriented urban buildings which are misclassified as vegetation (green color) in the decomposed image. This affects the prediction of traditional machine learning algorithms as well and KNN is worst affected resulting in maximum misclassification of the oriented urban targets as vegetation. However, it is clear from Fig. 2f that the DeepLabv3+ can correctly map the oriented urban targets under the urban class with minimal misclassification. Many performance evaluation matrices are being used to evaluate the image segmentation task like Accuracy, Kappa coefficient, F1 score, and so on. In this work, a pixel accuracy measure has been used to evaluate the performance of the semantic segmentation task. In addition to pixel accuracy, the F1 score has been reported for the resulting land cover maps. The Precision metric for the urban class has also been reported since one of the aims of this work is to address the issue of oriented urban targets which are generally misclassified as vegetation. Figure 3 is representing the performance measures of the implemented algorithms. It is shown in Fig. 3a for the test patch 1 or scene 1, the highest pixel accuracy of 83.51% has been achieved by the DeepLabv3+ algorithm and the KNN classifier has achieved the highest pixel accuracy of 78.68% among machine learning algorithms with both Random forest and SVM classifiers recording a pixel accuracy of below 75% as reported in Table 1. For test patch 2 or scene 2 as shown in Fig. 2b, the highest pixel accuracy of 87.78% has been achieved by the   Fig. 3c while the machine learning algorithms Random forest, KNN, and SVM recorded overall pixel accuracies of 77.92%, 76.48%, and 77.02% respectively. It is observed that the performance of RF is improved in the case of patch 2 that contains more of the urban region as compared to patch 1 that has a good contribution from all the classes because it fails to map the river water accurately in patch 1. Also, the performance of the KNN algorithm drops in the case of patch 2, mainly because it fails to correctly pick up the oriented urban targets and mislabel them as vegetation. In terms of the F1 score, the DeepLabv3+ recorded the highest F1 score of 0.8520 while the machine learning algorithms RF, KNN, and SVM recorded the F1 score of 0.7784, 0.7762, and 0.7646 respectively as illustrated in Fig. 3d. To evaluate the performance of all the algorithms, specifically in the case of urban targets, we also calculated the overall precision for urban class and DeepLabv3+ recorded the highest precision of 0.9228 as shown in Table 1, while all the machine learning algorithms recorded a precision value of less than 0.90 for the urban class for the same region. This shows that the DeepLabv3+ model can correctly classify the oriented urban targets which are misclassified by the decomposition algorithms.
Challenges and limitations of SAR data. A radar instrument, especially SAR, has several advantages over the optical multispectral sensor. Since the SAR sensors do not depend on Sun's illumination for Earth observation, imaging could be done day and night. The longer wavelength is an added advantage that could provide information for land/ocean features in the cloudy environment also. Apart from complements, SAR has various challenges and limitations. The main challenge in SAR data processing for land use/land cover classification and other thematic applications is to ensure distortion-free backscatter image 45 . Mainly three types of distortions affect the SAR data quality: radiometric, geometric, and polarimetric distortions. Appropriate calibration approaches are implemented to minimize the effect of distortions in the SAR imagery 46 . Radiometric calibration is performed with internal and external calibrators to find the actual return of the radar signals from the objects present on the earth's surface [46][47][48] . The backscatter image represents the radar cross-section obtained after scattering from all the objects within the resolution cell [49][50][51] . The concerned space agencies generally provide the radiometric calibration formula to ensure the correct radar return from the different object classes in the SAR data because it is not possible for all the researchers and users to evaluate the radiometric properties with the help of active and passive calibrators. Before implementing any classification approach or modelling for thematic applications, the radiometric calibration is performed to the Single Look Complex (SLC) or intensity/ amplitude data 52,53 . Geometric distortions in spaceborne SAR occur mainly due to the undulating topography of the imaged surface 54 . Due to side-looking imaging characteristics of a SAR system, there is the possibility to occur layover, shadow, and foreshortening in SAR image during data acquisition of any hilly or undulating terrain. The geometric distortions mainly affect the SAR image in the range 51 direction. The undulating terrain results in geometric distortions and affects the radiometric quality of the SAR data. The precise radiometric calibration of the SAR data requires the angle of incidence information of the incident electromagnetic waves transmitted by the SAR system on the terrain surface. Generally, the angle of incidence information is provided with SAR metadata, but it will be effective only if the topography of the surface is flat. If the terrain is undulating, then the actual angle of incidence will be different from the angle provided with SAR metadata 55,56 . To find the actual incidence angle for radiometric calibration and to minimize the geometric distortions from the SAR data, high-resolution digital elevation models are required 54,57 . The side-looking property of the SAR sensor not only gives geometric distortions in the undulating terrain but this also creates slant-range ambiguity 50 . Due to slantrange ambiguity, the actual shape and size of the ground targets could not be represented in the SAR imagery. The slant to ground range conversion of the data helps in providing actual ground information to the features imaged by the SAR system. The precise terrain correction algorithms are implemented with external high-resolution DEMs for the minimization of radiometric and geometric distortions from the SAR image 58,59 . The use of external DEM also helps in creating a simulation-based layover-shadow mask for the orthorectified SAR data of the undulating terrain 60,61 . The layover-shadow mask provides detailed information for the undulating terrain's portion affected by the geometric distortions due to layover and shadow effect. The radiometric normalisation and Orthorectification process successfully minimize the radiometric and geometric distortions from the SAR data 54 . To retrieve all the scattering information contributed by different objects in a small area, fully polarimetric quad-pol data are used because retrieval of all the scattering elements within a resolution cell is mathematically and practically rigorous with single or dual polarised SAR data 62,63 . The remote sensing community has widely used the fully polarimetric quad-pol SAR data to implement model-based decomposition approaches in the retrieval of scattering parameters [64][65][66] . The accuracy in polarimetric SAR model-based scattering retrieval is affected by the polarimetric distortion of the SAR data 67,68 . The polarimetric distortions are cross talk, channel imbalance, and Faraday rotation 69,70 . Out of these three PolSAR distortions, Faraday rotation is highly effective in low-frequency spaceborne SAR data and this distortion could be removed by implementing appropriate approach 71 . Several algorithms have been developed to minimize the effect of cross talk and channel imbalance from the PolSAR data 72 . After minimising radiometric geometric and polarimetric distortions the scattering retrieved from the mathematical modelling and decomposition of the SAR data shows appropriate scattering for different objects 46,73,74 . The smooth surfaces like barren land and water body could be characterised as surface scatterers with a difference in the surface scattering power 24 . The forest vegetation shows a dominance of volume scattering in the PolSAR decomposition and the urban settlements show a very high amount of doublescattering power 26,75 . Since, distortion-free polarimetric SAR data helps in identifying the scattering of different features of the imaged area, hence the implementation of classification approaches on these data gives a fruitful www.nature.com/scientificreports/ result with reliable accuracy. This work implemented the land use land cover classification on fully polarimetric quad-pol UAVSAR data after implementing G4U decomposition that targeted the vegetation covers as a forest class and the urban class is representing by the combination of dense and sparse building areas that includes the rural areas, industrial plants and inner-city areas with high rise buildings. The achieved accuracy of urban class using DeepLabv3+ model is 92.28% that is a major significance of this work.

Discussion
The present study provides an extensive analysis of popular machine learning classifiers Random Forest, K-Nearest Neighbor, and Support Vector Machine and deep learning model DeepLabv3+ for semantic segmentation of Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) data captured on a part of Houston City, Texas (USA). This fully polarimetric data with a high resolution of 2 m provides concrete information for the training of machine learning models. Another advantage of using the UAVSAR dataset is that pre-processing steps like calibration and multi-looking are not required to be performed since the data available on the NASA-JPL platform is already radiometrically calibrated and multi-looked. As mentioned earlier that the major hindrance in the large-scale accurate land cover mapping is the unavailability of a common benchmark dataset in a sufficient amount that could be used for training the deep learning model. Even if the data is available, the ground truth is unavailable most of the time and it is not possible to validate the results obtained without the availability of proper ground truth learning to the limitation of its usage. Also, labeling a large dataset for training a deep learning model from scratch is a very time-consuming and cumbersome task that can take up to hundreds of man-hours. This issue could be overcome with help of the transfer learning method by using a small dataset as shown in this work. The results obtained by the transfer learning method using a pre-trained DeepLabv3+ model are better than those obtained from the popular traditional machine learning algorithms. Apart from the smooth overall image segmentation results, the deep learning model is also able to correctly classify the oriented-urban targets while the traditional machine learning algorithms failed to pick them up and misclassified them into vegetation class. This is because deep neural networks can fetch textural features better as compared to traditional algorithms. Upon close observation of the PolSAR data and the false color-composite images, in particular, it can be observed that although the color of oriented urban targets is similar to that of vegetation/forest region, they are texturally different from vegetated areas which are more smooth in terms of image, unlike urban features which have patterns due to the streets running between the urban blocks. The deep learning encoder based on Xception architecture can highlight this dissimilarity much better compared to traditional manual feature extraction methods resulting in the better performance of the deep learning model compared to traditional machine learning algorithms. The transfer learning method has a drawback of negative transfer because of the difference in the datasets used while training the model and the dataset used while transfer learning. It works best only when the two datasets are similar enough. Since the pre-trained model used in this study is trained on an ImageNet dataset that consists of real-life camera images, there could be some negative transfer due to the dissimilarity of the ImageNet dataset from the PolSAR data. But currently, there is no measure to determine the same. In the future, with the availability of a large amount of open-source PolSAR data, the training dataset can be increased to train a deep learning model from scratch and possibly achieve even higher accurate results.

Conclusions
Land Use Land Cover mapping is important to address the ever-increasing environmental concerns like climate change, soil degradation, deforestation, and water pollution. But this need for highly accurate land cover maps is hindered by the unavailability of a common benchmark labeled dataset used to train the deep learning models. The present work demonstrated and proved that the transfer learning associated with deep learning models can be effectively used to generate accurate land cover maps and their classification even with a small training dataset. The transfer learning algorithms not only address the issue of unavailability of a large amount of labeled dataset in the field of SAR based remote sensing but also takes lesser computational time and resources compared to training a deep learning model from scratch. In both the targeted test patches, DeepLabv3+ clocked the highest pixel accuracy of 83.51% and 87.78% while among the machine learning algorithms RF classifier achieved the highest overall pixel accuracy of 77.92% for the segmentation task. It has also been validated through these experiments that the KNN classifier is rightly known as the lazy learner since it has taken a maximum training time of around 41,091 s for the overall training dataset and inference time per scene of around 864.94 s. One of the focused areas of this work was to reduce the misclassification of oriented urban targets as vegetation. This could be successfully achieved by using DeepLabv3+, an advanced deep learning model. It can be noted that the current state-of-the-art DeepLabv3+ model works better than the traditional machine learning methods even with small datasets contrary to the popular belief that large datasets are required for deep learning models to achieve better results as compared to traditional machine learning algorithms. In the future, this work can be extended to training and validating the DeepLabv3+ model from scratch subject to the availability of a common benchmark dataset in a sufficient amount.

Methods
Data. The data used in this research is taken from the open access platform of NASA-JPL 76 . It's fully polarimetric L-band data from airborne sensor UAVSAR which has a resolution of 2 m. The target area of Houston City shown in Fig. 4 has been chosen as it provides a variety of land cover features. Houston is a large metropolis city in the state of Texas, USA. Most of its area consists of nearly level, clayey and loamy, prairie soils. The L-band Uninhabited Aerial Vehicle Synthetic Aperture Radar data was used to test the potential of machine learning algorithms for semantic segmentation tasks. A part of the Houston metropolis was imaged by the UAVSAR fully   www.nature.com/scientificreports/ polarimetric quad-pol SAR data. Figure 4b shows a false-color composite image of G4U decomposition-based output for the study area. The green color highlights the features responsible for volume scattering. Doublebounce or even-bounce scatters which contribute toward high backscatter are shown in red color and the surface scattering or odd-bounce scattering is represented in blue color. The details about the UAVSAR instrument and the data used are mentioned in Table 2.

Methodology. The UAVSAR Quad-Pol Ortho-Rectifying data is provided as the covariance matrix (C3)
which is converted to a coherency matrix (T3). Generally, the SAR dataset is influenced by speckle noise which creates problems in object detection and classification. To minimize speckle from the PolSAR data various specialized filters are used which are mainly developed to minimize the speckle effect from polarimetric SAR data. Several filters have been developed for speckle filtering of PolSAR data where the Lee filter is one of the most famous and effective speckle filters which is used in this work for polarimetric speckle filtering of UAVSAR data. After speckle-filtering, PolSAR decomposition is applied to generate a false-color composite image which is used to create a training dataset for our ML models after breaking it into smaller patches. One of the first widely accepted model-based decompositions was proposed by Freeman and Durden 63 . This model decomposed the SAR backscatter among three different scattering types namely odd-bounce scattering, double-bounce scattering, and volume scattering. Later, an improved model with an additional scattering type, Helix scattering was proposed by Yamaguchi 77 . Based on Yamaguchi decomposition, Singh et al. 78 proposed a general 4-component decomposition method with a unitary transformation that makes use of full polarimetric information in decomposition. In previous models, the T13 component of the Coherency matrix was not being used but this method allowed the usage of the same. It also included an extended volume scattering model which discriminates volume scattering between dihedral and dipole scattering structures caused by the cross-polarized HV component. This lead to an enhanced double-bounce scattering from urban targets compared to the existing model-based decompositions. This is the reason why the G4U decomposition method is used in the present work. Four component decomposition is expressed in Eq. (1) where P s , P d , P v , and P c are scattering powers to be determined.
[T] s , [T] d , [T] v , and [T] helix are expansion matrices corresponding to surface, double-bounce, and helix scattering respectively. This false-color composite image is sub-divided into smaller chunks of 500 * 500 pixels and 20 patches are labeled using the Gimp tool to create a dataset for training and testing the machine learning algorithms and the DeepLabv3+ model. From these 20 patches, 18 patches were used for training while 2 patches were kept for testing and validation. After generating the required dataset, manual feature extraction is carried out for the implementation of machine learning models by using Gabor filter, Gaussian filter, Canny edge detector, and Median filter while in the case of DeepLabv3+ feature extraction is automatically carried out by the Xception based encoder. The complete workflow adopted for the task of semantic segmentation is presented in Fig. 5 and the google image, decomposed false-color composite, and the corresponding annotated label for the test patches is shown in Fig. 6. The details about the training and working of different algorithms are discussed ahead separately.
The Gabor filter is represented by Eq. (2) which was originally given by Gabor 79 .
where h x -Bandpass filter, h y -Gaussian filter. The Gaussian filter 80 is represented by Eq. (3) where σ 2 is the variance of Gaussian filter, and l(−l ≤ x, y ≤ l) is the size of the filter kernel.
In the implementation of Canny edge detector 81,82 , firstly the image undergoes Gaussian filtering and then Gradient magnitude and direction is calculated using below Eq. (4) where G x i, j and G y i, j are the partial-derivative at point i, j in x-direction and y-direction respectively.

Machine learning methods.
Machine Learning is a process of enabling a computer to take independent, intelligent decisions with the help of supervised or unsupervised learning. The commonly used machine learning algorithms for classification tasks are Random Forest, K Nearest Neighbour, and Support Vector Machine.
Random forest classifier was first proposed by Breiman 28 in 2001. It is based on Bagging predictors 83 proposed by Breiman himself in 1996. In random forests, all the trees work independently of each other, and thus, the algorithm can be run in parallel. Random Forest classifier is very robust towards outliers and noise but it requires a large amount of training dataset. Assuming a dataset {r 1 , r 2 ,…, r n } ∈ R L×n , where L is the number of features and n is the number of samples. r i represents the position of sample i in the space R L×n , while z donates the relationship between r i and r j , where z ∈ Z = {1,-1}. If r i and r j belong to the same class, then z = 1, and if r i and r j belong to different classes, then z = −1. For the mth tree, the ensemble is f p (r) = f(r,θ m ), and a tree f(r,θ m ) www.nature.com/scientificreports/ The margin function for a random forest is defined in Eq. (7) where j ≠ z and if ml(r,z) > 0, we can obtain a correct result. K-Nearest Neighbor has proved to be a powerful nonparametric classifier 27 . As per the Nearest Neighbor rule, if a set of n pairs (x 1 ,θ 1 ),…., (x n , θ n ) is considered, where the x i 's take values in a metric space X upon which is defined a metric d, and the θ i 's take values in the set {1, 2,…., M}. Every value of the θ set is considered as the index of the category to which ith individual belongs, and every corresponding x i is the outcome of the set of measurements made upon that individual. Given a new pair (x, θ), where x is the measurable quantity and θ needs to be estimated with the help of information available from the set of correctly classified points. Then x' ∈ {x 1 , x 2 ,…., x n } is called the nearest neighbor to x if it satisfies the Eq. (8), The Nearest Neighbor rule predicts that x belongs to the category θ n ′ of its nearest neighbor x n ′. A mistake occurs if θ n ′ ≠ θ. The Nearest Neighbor rule utilizes only the classification of the nearest neighbor. The n-1 remaining classification θ i are ignored. Support Vector Machine classifier acts as a parametric classifier when used as a linear SVM while advanced SVMs used to classify complex non-linear data are non-parametric. The complex dataset can be projected as linear in a high-dimensional feature space using kernels like radial basis function (RBF), but the computation is not carried out in that high dimensional feature space. By using kernels, all necessary computations are performed directly in the input space 29 . There are three types of kernels mainly used with non-parametric SVMs -polynomial kernel, RBF kernel, and sigmoid kernel as represented by Eqs. (9), (10), and (11).
Polynomial Kernel (k), (6) C(r) = arg max j∈Z P(j|r) (7) ml(r, z) = P(z|r) − max j∈Z P m (j|r)   85 . A general Semantic Segmentation architecture consists of an encoder-decoder network. An encoder network can be a classification network like VGG/RESNET/Xception. A decoder network is used to project the lowresolution result of the encoder network to a high-resolution pixel space by upsampling. An encoder-decoder network has been successfully used for the semantic segmentation task [86][87][88] .

DeepLabv3+
DeepLabv3+ is an extension of the DeepLabv3 model of the DeepLab series. The overall encoder-decoder structure of DeepLabv3+ is represented in Fig. 7. The function of the encoder module is to encode the multi-scale contextual information with help of atrous convolutions, while the decoder module refines the image segmentation results along the object boundaries. The DeepLabv3+ uses a modified Xception model as an encoder as shown in Fig. 8. The earlier DCNN's had a limitation that the repeated use of max-pooling and striding at every step of these networks, significantly reduced the spatial resolution of the output feature maps 88 . To overcome this problem, atrous convolution, which was originally developed for computing undecimated wavelet transform 89 , is used. It allows the computation of responses at any desired resolution 88 . Convolution operation can be defined with the help of three different parameters-kernel size, stride, and padding while an atrous convolution also known as dilated convolution has an additional parameter called dilation rate (r) which is the spacing between (10) k x, y = exp − x − y 2 / 2σ 2 (11) k x, y = tanh κ x · y + θ  where r = atrous rate, it determines the stride with which the input is sampled. Atrous convolution allows for a wider field of view at the same computational cost. A comparison between normal convolution and atrous convolution is presented in Fig. 9.
For capturing the contextual information at multiple scales, parallel atrous convolution with different rates which are known as Atrous Spatial Pyramid Pooling (ASPP) was integrated in DeepLabv3. The useful semantic information is present within the last feature map but the information related to object boundaries is missed due  www.nature.com/scientificreports/ to the convolution operation with striding. This problem can be resolved by applying the atrous convolution to extract denser feature maps but this becomes computationally expensive and inefficient. Whereas in the case of encoder-decoder models no features are dilated in the encoder path and they gradually recover the sharp object boundaries in the decoder path leading to faster computations. To combine the advantages of both methods, DeepLabv3+ extends the DeepLabv3 model by adding a simple but effective decoder module for recovering the sharp object boundaries.  www.nature.com/scientificreports/ It has been reported that by using depthwise separable convolution, the computational complexity in terms of Multiply-Adds is reduced by 33% to 41% while maintaining a similar performance which results in improved speed and accuracy by adapting Xception as a backbone 90,91 .
The Xception model used as a backbone architecture in the current work has 14 blocks whose details are given in the Appendix. To better understand the working of this deep learning model, a visualization of features maps corresponding to the first hidden layer of each block has been generated. Figure 10 shows the feature maps extracted from Block 1, Block 7, and Block 14 respectively. The features extracted from all 14 blocks are given in the Appendix. It can be noted that the initial layers work on simple and coarser features while the higher layers work on finer details. This model has implemented using the python programming language, the supporting libraries like numpy, scipy, sklearn, opencv, pandas, and Tensorflow framework that executed using Anaconda (version 4.9.2) on a workstation with NVIDIA Quadro P4000 GPU and 48 GB RAM. It is easy to implement transfer learning using the TensorFlow in Python with a pre-trained model available on the Github page of DeepLabv3+ 42 . Since the pre-trained model is trained on the ImageNet dataset which has a much higher number of classes compared to our dataset, few changes have been made in the original DeepLabv3+ code for implementing this model using transfer learning on our custom dataset as discussed in the Appendix.