Analysis of regional economic development based on land use and land cover change information derived from Landsat imagery

The monitoring of economic activities is of great significance for understanding regional economic development level and policymaking. As the carrier of economic activities, land resource is an indispensable production factor of economic development, and economic growth leads to increased demand for land as well as changes in land utilization form. As an important means of earth observation, remote-sensing technology can obtain the information of land use and land cover change (LUCC) related to economic activities. This study proposes a method for analysing regional economic situations based on remote-sensing technology, from which LUCC information extraction, sensitivity factor selection, model construction and accuracy evaluation were implemented. This approach was validated with experiments in Zhoushan City, China. The results show that the economic statistical index is most sensitive to the construction land area, and the average correlation coefficient between the actual data and the predicted data is 0.949, and the average of mean relative error is 14.21%. Therefore, this paper suggests that LUCC could be utilised as an explanatory indicator for estimating economic development at the regional level, and the potential applications of remotely-sensed image in economic activity monitoring are worth pursuing.

The monitoring of economic activities is of great significance for the understanding of economic situations and the support of policymaking concerned with sustainable development and management 1 . Considering that economic activities are cumulatively changing the surface of the Earth, data that reveal the Earth's surface changes can therefore enable the capability of frequent and large-scale observations of economic activity, which could substantially improve understanding of the actual economic situation and its trend prediction. Traditional data collection methods such as mapping and ground surveying are time-consuming and costly 2 . Additionally, the information is not updated frequently and is difficult to access 3,4 . A powerful implementation of economic activity monitoring research is remotely-sensed image, which provide an up-to-date and realistic presentation of the Earth's surface. Remote-sensing technology is an effective means of observing surface changes on the Earth due to its fast and wide-range imaging capability [5][6][7][8][9][10] . Since the 1970s, terrestrial Earth observation data have been continuously collected in various spectral, spatial and temporal resolutions 11,12 . In recent decades, the accessibility, quality and scope of these data have been continuously improving, making it a fundamental information source in the study of pattern change and visualization of the Earth's surface as well as important data in the research of human activities monitoring 13,14 .
Having the capability to detect low levels of visible and near-infrared (VNIR) radiance at night, the Defense Meteorological Satellite Program-Operational Linescan System (DMSP-OLS) night-time light (NTL) data provided a new scope for measuring human economic activities [15][16][17][18][19][20] . These NTL data are free and feature a wide spatial coverage from − 180° to 180° longitude and − 65° to 75° latitude, thus greatly enhancing NTL application research 21 . As an objective reflection of human activities, NTL data provide a cost-effective and spatially www.nature.com/scientificreports/ (2) sensitivity factor selection (in "Sensitivity factor selection" section), (3) model construction (in "Model construction" section) and (4) accuracy evaluation (in "Accuracy evaluation" section). The software packages used for this study were environment for visualizing images (ENVI) for image processing, ArcGIS and MATLAB were used for analysing and presenting the results, and statistical product and service solutions (SPSS) was used for statistical analysis.
LUcc information extraction. The LUCC information is obtained from remotely-sensed time series data. First, pre-processing was performed on the remotely-sensed image, including radiometric calibration, atmospheric correction and image cropping. The digital numbers in the raw data were converted to the top of atmosphere (TOA) reflectance by physical means via calibration parameters provided by Calibration Parameter Files (CPF), and the influence of atmospheric scattering and absorption were reduced by using Fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) in ENVI. Then, based on features such as the spectral and spatial resolutions of the image, the training samples were selected and the maximum likelihood classification (MLC) algorithm in supervised classification was carried out in order to obtain the land use types and their associated areal coverage within the study area. Considering the visual separability of different ground objects, the training samples were selected for six classes such as construction land, water, bare land, forest, tidal flat, crop land, and the training samples were divided into two parts, two-thirds for classification, and one-thirds for accuracy assessment. The reliable accuracy of classification was performed using overall accuracy and Kappa coefficient computed, and the overall accuracy is a measure of how well the classified pixels match the ground truth data while the Kappa coefficient measures how well the classification in question would compare to a chance arrangement of pixels to each land cover class. Finally, linear interpolation was performed by Eq. (1) on data with missing years, since remotely-sensed image may not cover all years.
where Data INI+1 is the data of ith year after the initial year INI, Data INI and Data TER are the data of the initial year INI and the termination year TER, respectively, and the data of the years between the initial year INI and the termination year TER is missing.
In the study, the overall accuracy and Kappa coefficient were computed by Eq. (2) using the confusion matrix, which is a square array of numbers set out in rows and columns which express the number of sample units (i.e., pixels, clusters of pixels, or polygons) assigned to a particular category relative to the actual category as verified on the ground 55-57 . where OvAc is and k hat are overall accuracy and Kappa coefficient, respectively, r is the number of rows in the matrix (the total number of categories), x ii is the number of observations in row i and column i (the total pixels number of corrected classifications in training samples used for accuracy assessment), x i+ and x +i are the marginal totals of row i and column i, respectively, and N is the total number of observations (the total pixels number of training samples used for accuracy assessment).
Sensitivity factor selection. The interaction mechanism between economic growth and LUCC information is complex, and eleven economic indices are selected to describe economic status: gross domestic product (GDP), value-added of primary industry (VPI), value-added of secondary industry (VSI), value-added of tertiary industry (VTI), per capita GDP (PGDP), fixed assets investment (FAI), total tourist income (TTI), gross industrial output value (GIOV), gross agricultural output value (GAOV), gross planting output value (GPOV) and gross forestry output value (GFOV). Therefore, the land category that is most relevant to the economic statistical index must be selected as sensitive factor to construct the model for estimating socioeconomic situations. The correlation coefficients between the economic statistical index and the land use type by Eq. (3) using correlation analysis. For each economic statistical index, the most relevant land use type was selected as the sensitivity factor, i.e., the explanatory variable in the model.
where r LUCCm−ESIn is the correlation coefficient between land use type m (one in construction land, water, bare land, forest, tidal flat and crop land) and economic statistical index n (one in GDP, VPI, VSI, VTI, PGDP, FAI, TTI, GIOV, GAOV, GPOV and GFOV), x LUCCmi and x ESInj x ESInj are ith land use type m and jth economic statistical index in category n, respectively, x LULCm and x ESIn are the average of land use type m and the average of economic statistical index n, respectively, N is the total number of land use type m or the total number of economic statistical index n.
Model construction. Regression analysis was applied when using the LUCC information to model the economic statistical indices. In order to eliminate heteroscedasticity and clarify the relationship between the LUCC information and the economic statistical indices more accurately, the logarithmically transformation base 10 www.nature.com/scientificreports/ were performed to change the range and scale of the data. We attempted to construct a single-factor quantitative model in which each economic statistical index is a dependent variable and the area of each land use type is an independent variable. The model is described by Eq. (4): where I economic is the value of a given economic statistical index, L landuse is the area of the land use type that is selected as the sensitivity factor for this economic statistical index, and f is the quantitative model. In this study, by inspecting scatter plots, a series of comparative statistical regression analyses are conducted, including linear, quadratic-term, power and exponential models. 4 types of simple quantitative models were constructed as shown in Table 1.
Accuracy evaluation. Accuracy evaluation was performed to validate the model. For the model validation data, the independent variable was inserted into the regression model to obtain the estimated value, which was then compared with the actual value. The relative error (RE) is the ratio of the absolute error to the actual value, which can reflect the deviation of the model prediction from the actual value. The mean relative error (MRE) was used to evaluate the overall accuracy of the models. The formula is as follows: where y e and y a are the model estimated value and the actual value, respectively, and n is the number of the actual value.

Study area and data
Study area. Founded in 1987, Zhoushan is the first prefecture-level city in China that consists of islands; specifically, the Zhoushan Archipelago, which consists of 1390 islands with areas greater than 500 m 258 . Zhoushan City is located on the coast of the East China Sea, west of Hangzhou Bay and north of Shanghai (Fig. 2). It has a total administrative area of approximately 22,200 km 2 , but a land area of only 1140.12 km 2 . Zhoushan has abundant marine economic resources and is well known for marine fishery, tourism, international shipping and shipbuilding industries 58 . In 2011, the Zhoushan Archipelago New District was established. This was the first national strategic-level new district in China with a marine economy theme.
Zhoushan is characterised by hilly landforms, with numerous mountains and hills on the islands. Thus, land that can be effectively used is scarce. For this reason, the land development intensity of the islands varies greatly and the core zones of the city are located on the islands with larger areas. Zhoushan Island is the largest in Zhoushan City and also its economic and political center, and the area of Zhoushan Island is 502.65 km 2 , its east-west length is 44 km and its north-south width is 18 km 59 . The study area presented in Fig. 2a includes Zhoushan Island, Changzhi Island, Aoshan Island, Xiaogan Islands, and Lidiao Islands. The total land area of study region is 529.38 km 2 . This area is the core zone of Zhoushan and has experienced dramatic LUCC due to the rapid economic growth of the city in recent decades 58 . The original remotely-sensed image of the study area acquired on February 22, 2020 from the Landsat-8 OLI image is presented in Fig. 2b. The image are clear and high quality because of good weather conditions, and the study area includes the ocean, lakes, river, urban areas, wetlands, forest, and other features.
Data. The data used in this study can be classified into 2 groups: remotely-sensed image and regional statistics of the study area. The remotely-sensed image for a particular day of a given year were used to derive the annual LUCC dynamics of the study area using classification technology of remotely-sensed image. The regional statistics were used to characterise the regional economic development situations for each calendar year.
Remotely-sensed image. We attempted to determine the LUCC information of the study area from remotelysensed image since the city was established. Given the limitations and constraints in the acquisition and selection of proper images, Landsat satellite images were used to derive the LUCC information in the study area. Landsat is a series of terrestrial satellites launched by NASA. Since 1972, 8 satellites have been launched, of which the Landsat 6 satellite failed to transmit. At present, the Landsat satellites have been continuously observing the Earth for more than 40 years and have accumulated large-scale, long-term remotely-sensed image, which are y e − y a y a Table 1. The type and representation of the model. Where y is the economic index of the logarithmic transformed base 10, x is the area of land use type of the logarithmic transformed base 10, and a, b and c are coefficients. www.nature.com/scientificreports/ widely used in Earth observation research 21,32 . Landsat satellites have basically the same observation conditions and 16-or 18-day re-entry cycles. In addition, the thematic mapper (TM), enhanced thematic mapper (ETM+), operational land imager (OLI) on Landsat satellite are the multi-spectral sensor with spatial resolution of 30 m (except for several spectral bands), which is better than the NTL data.
Considering the availability of cloud-free spatial coverage and the consistency of the annual acquisition date, 11 Landsat TM or OLI images spanning 32 years (1984-2016) were used to obtain the multi-temporal LUCC information of the study area ( Table 2). The collected images were provided by the US Geological Survey (USGS) (https ://glovi s.usgs.gov/) and the Geospatial Data Cloud Platform of the Chinese Academy of Sciences Computer Network Information Center (https ://www.gsclo ud.cn). The image format is GeoTIFF and the coordinate system is World Geodetic System 1984 (WGS84) projected by Universal Transverse Mercator (UTM) Projection. For the Landsat TM images, only the 6 reflective bands with 30-m spatial resolution were used for further data analysis, while the thermal infrared (TIR) band with a coarse spatial resolution of 120 m was excluded. For OLI images, the Pan band and Cirrus band were excluded, while the other 7 bands with 30-m spatial resolution were used.
Socioeconomic dataset. In general, GDP is the most common economic indicator. In this study, we extended the selection of indicators to include those that are, in theory, closely related to LUCC information. We assembled a city-level statistical dataset spanning 32 years (1984-2016) from the statistical yearbook of Zhoushan City, gross domestic product (GDP), value-added of primary industry (VPI), value-added of secondary industry (VSI), value-added of tertiary industry (VTI), per capita GDP (PGDP), fixed assets investment (FAI), total tourist income (TTI), gross industrial output value, (GIOV), gross agricultural output value (GAOV), gross planting  www.nature.com/scientificreports/ output value (GPOV) and gross forestry output value (GFOV). GDP is a monetary measure of the market value of all the final goods and services produced in a specific period, PGDP refers to the per capita GDP, the primary industry (PI) refers to agriculture, forestry, animal husbandry, and fishery (excluding the service industry in agriculture, forestry, animal husbandry, and fishery), the secondary industry (SI) refers to mining (excluding mining auxiliary activities), manufacturing (excluding metal products, machinery and equipment repair), electricity, heat, gas and water production and supply, and construction, the tertiary industry (TI) is the service industry, which mainly includes transportation, communications, commerce, catering, finance, education, and public services, FAI measures the change in the total spending on non-rural capital investments such as factories, roads, power grids, and property in Chines, the TTI refers to the total monetary income obtained by the destination country or region in a certain period from providing tourism products, purchasing goods, and other services to tourists at home and abroad, the GIOV refers to the total result of industrial production activities of an industrial enterprise (unit) in a certain period, which is the total value of industrial final products and industrial labor services provided in money, the GAOV is the total amount of all agricultural, forestry, animal husbandry, and fishery products expressed in monetary form in a certain period (usually 1 year), the GPOV refers to the total amount of plantation and agricultural products expressed in monetary form in a certain period (usually 1 year), and the GFOV refers to the total amount of fishery expressed in monetary form in a certain period (usually 1 year) 13,16,60-63 . The details of these indicators are listed in Table 3.

experimental results and analyses
LUcc information extraction. The Landsat satellite images were pre-processed using several procedures, i.e., radiometric calibration, atmospheric correction and image cropping. In classification system construction, the spatial resolution, spectral resolution of remotely-sensed image and the features of ground objects in the study area need to be considered comprehensively. The study area has a complex landscape, with hills in the centers of the islands, making the spatial distribution of the objects discrete and resulting in mixed pixels at the spatial resolution of the images. Due to the significant spectral confusion, we grouped several categories together; specifically, grassland was grouped into cropland; aquaculture and brine pan were grouped into tidal flat. Finally, the LUCC information in the study area was divided into 6 categories: (1) construction land, (2) forest, (3) water, (4) bare land, (5) cropland and (6) tidal flat. Maximum likelihood estimation was applied to the supervised classification of pre-processed Landsat images. The images were visually enhanced using linear contrast stretching and different band combinations to help select training samples. The classification results were modified and corrected in order to eliminate obvious errors. Accuracy assessment was performed for each classification result, and the samples were chosen from the repository of Google Earth historical images. The average value of the overall classification accuracy was 84.05%, and the average value of the Kappa coefficient was 0.80, as shown in Table 4.
The final classification maps for each of the 11 years are shown in Fig. 3, and the statistics of each category from 1984 to 2016 are listed in Table 5, and based on the classification results, the areas of land use types for missing years were obtained by the linear interpolation. In the Table 5, the bold and the italics represent the original data and the interpolated data, respectively.
The study area has obvious characteristics of LUCC information over the past few decades. First, construction land has increased more than fivefold, while tidal flats and cultivated land/grassland have decreased significantly. The construction land area continually increased over the 32-year study period, from 19.57 to 131.51 km 2 , an increase of 111.94 km 2 and an increase ratio of 572.03%. Conversely, the tidal flat area decreased by 27.40 km 2 , from 29.35 to 1.95 km 2 , translating to a decrease ratio of 93.35%. The area of cultivated land/grassland decreased by 61.60 km 2 , from 220.70 to 159.10 km 2 , or by 27.91%. Meanwhile, forest land changed relatively little in ratio and area. The forest area increased by 16.06 km 2 , from 210.72 to 226.78 km 2 , translating to an increase of 7.62%, while the water area increased by 5.56 km 2 , from 5.62 km 2 to 11.18 km 2 , an increase ratio of 98.80%. Finally, the Sensitivity factor selection. Given that LUCC information is a complex process driven by socioeconomic factors, the primary challenge for estimating economic situations using LUCC information is to determine the association between the area of land use type and the economic statistics. Pearson correlation analysis was applied to qualitatively examine the statistical dependence between area of land use type and economic statistics across the study period. The Pearson correlation coefficient (ranging from − 1 to 1) was used to indicate the sensitivity level of land use type versus economic indices. In addition, the statistical significance level was tested using two-tailed t-statistics.
In order to make LUCC information consistent with economic statistics, linear interpolation was performed on missing-year data. Consequently, we obtained 33 sets of raw data consisting of the LUCC information and economic statistics for every year from 1984 to 2016. This raw dataset was then logarithmically transformed (in base 10, described as lg) in order to eliminate the intrinsic exponential growth trend of economic indicators and to make the data more consistent with the normal distribution, which is the assumption of Pearson correlation  www.nature.com/scientificreports/ analysis. We extracted one-thirds of the dataset at equal intervals for model validation, with the remaining data used for modelling. The correlation coefficients between each economic index and the area of each land cover type for the data used to construct the model are listed in Table 6. These results reveal that all of the economic indices are positively correlated with construction land, forest and water, and negatively correlated with cropland, bare land and tidal flat. The relevance varies, but it is obvious that each economic index is significantly correlated with construction land area; indeed, the coefficients have the highest values among all the land use types. It is apparent that among all of these land use type, the change of construction land is the best explanatory variable for revealing the trend of economic development in the study area. Thus, construction land was selected as the sensitivity factor for the single-factor quantitative model.

Model construction.
In order to reduce the impact of large variations in economic index values in time series and to improve the accuracy of regression analysis, a lg-lg regression model was used to estimate economic indices. Taking the lg of the economic index as the dependent variable and the lg of area of construction land (ACL) as the independent variable, the lg-lg scatter plots are presented in Fig. 4. The coefficients for the lg-lg regression model were estimated by the data used to construct the model. The specific information is shown in Tables 7,8,9 and 10; all of the models are significant, with p < 0.01. Accuracy evaluation. Model validation data were used to verify the models. The estimated economic indices were derived by the models and then compared to the actual values. The MRE is listed in Table 11, showing that the estimation accuracy varies among the models. For most of the economic indices, including VPI, VTI, PGDP, TTI, GAOV and GPOV, the quadratic-term models have higher precision than other models. For VSI and GIOV, linear models have the highest precision. For GDP and FAI, power models display the best perfor- www.nature.com/scientificreports/ mance in terms of precision. Overall, in spite of the model differences, GDP, VPI, VSI, VTI, PGDP, FAI and GIOV are better estimated than TTI, GAOV, GPOV and GFOV. The best-fitting models for quantifying the relationship of each economic index were obtained by comparing the MREs of the different models. Among the 4 model types, the model with the lowest MRE was selected as the final model for each economic index. As shown in Table 12 and Fig. 5, the prediction errors of GDP, VTI and PGDP are less than 10%, indicating that these 3 economic indices are quite well estimated by the best-fitting models. For VPI, VSI, FAI, TTI, GIOV, GAOV and GPOV, the errors are also within 20%, which are satisfactory. For GFOV, however, the lowest MRE of 28.19% indicates that the models could not fit it accurately. Overall, the quantitative models could accurately reveal the dynamic changes for most of the economic indicators in this case study.

Discussion
Remotely-sensed image record changes on the Earth's surface and thus can be used to represent human activities and to estimate socioeconomic indicators. Traditionally, NTL data are the main remotely-sensed image utilised to estimate socioeconomic situations. Few studies, however, have focused on the quantitative relationship between LUCC information and economic development, which is also a reliable indicator for estimating socioeconomic situations.
This study opens up unique opportunities for the objective, seamless understanding of regional economic development from the perspective of land-use/cover change using remotely-sensed time series data, as well as the correction of economic survey data, both with a high degree of accuracy. The results of the case study in Zhoushan City indicated that LUCC information derived from remotely-sensed image could be indicative of dynamics in economic activity during economic development processes at the city level, as revealed by various quantitative correlations with relevant economic statistics. There is good performance in modelling the economic statistics when the area of construction land is selected as the sensitivity factor. The method proposed in this study still contains some deficiencies and uncertainties, however, as a result of the following factors.
The LUCC information is the key factor affecting the modelling accuracy, since the sensitivity factors selected from LUCC information were the basis for the regressions. The spatial resolution of Landsat imagery was relatively low and the grouping of land use types was made due to the low separability caused by mixed pixels. It is necessary to use high-resolution images to extract more detailed LUCC information, and at the same time, we can use much more suitable classification methods to improve classification accuracy.
1. The spatial matching of remotely-sensed image and statistical data is also one of the influential factors which is currently not perfect and requires further improvement in future studies. Due to an absence of statistical data precisely matching the remotely-sensed image spatially, we had to utilise the LUCC information that only covered the core zone of Zhoushan City when modelling the statistical data at the city level. 2. In the study, only a single factor was included in the modelling, while in reality the correlation analysis showed that several land use types are significantly correlated with economic indices. Thus, additional factors should be included in the modelling, and the analysis of the impacts of different land use types on economic indices is necessary. 3. The existing studies have shown that the interaction between LUCC and economic development displays obvious regional differences. This interaction may be affected by many natural and unnatural factors such as land resource conditions, land policy and economic development stage. Therefore, the reliability of the proposed method needs to be further verified by additional case studies in different areas.  www.nature.com/scientificreports/ conclusions From the perspective of the interrelationship between LUCC information and economic development, this study proposed a method for analysing regional economic situations using remotely-sensed image to extract LUCC information. Through a case study of Zhoushan, China's first prefecture-level island city, this research investigated the ability of LUCC information to estimate economic indices. The LUCC information was extracted from Landsat images, taking the area of construction land as the explanatory variable after correlation analysis. Eleven economic indices-GDP, VPI, VSI, VTI, PGDP, FAI, TTI, GIOV, GAOV, GPOV and GFOV-were incorporated www.nature.com/scientificreports/ in linear, quadratic-term, power and exponential models. The accuracy evaluation revealed that the mean relative errors of the best-fitting models for the 11 economic indices were 6.50%, 14.47%, 14.57%, 7.61%, 7.38%, 14.95%, 17.99%, 14.60%, 17.45%, 12.57% and 28.19%, respectively. In conclusion, the results prove that LUCC information could be used as an explanatory indicator for estimating economic development at the regional level, and  www.nature.com/scientificreports/ the potential applications of remotely-sensed image in the monitoring of economic activities are worth pursuing.
In future research, the remotely-sensed image quality still has room for improvement, and more methods could be applied to the classification process.
In this paper, a comprehensive analysis method for analysis of regional economic situation was enriched using remote-sensing technology. The study has some deficiencies, and further work should be conducted regarding (1) more case studies from different regions should be undertaken in order to verify the reliability and applicability of www.nature.com/scientificreports/ the proposed method; (2) using remotely-sensed image with the higher spatial resolution to obtain more detailed information on land use and cover change, and reduce the impact of mixed pixels on statistics of areas of land use types; (3) using the method such as deep learning to improve the classification accuracy.