GIS-based precise predictive model of mountain beacon sites in Wenzhou, China

In ancient China, where was frequently troubled by invaders, the government set up many beacon towers for alerting and transmitting military information along the border and the coast. Many beacon sites still exist in some areas, which are generally located in dangerous places with high mountains and rough terrain, bringing great difficulties to archaeological discovery. Therefore, it is particularly important to develop a predictive model applicable to the distribution of mountain beacon sites. Taking 68 beacon sites found in Wenzhou as research samples, this study used the superimposed method of logistic regression and viewshed analysis, forming a high-precision, scientific and operational predictive model for the distribution of beacon sites, which was verified by the cross-validation method. The results showed that the beacon site predictive model simulated in this study could reduce the probability scope of site location by 90% compared with the common logistic regression predictive model, which greatly improved the accuracy and ability of site prediction. At the same time, it could also be used to understand the relationship between the known sites and their surroundings to assist in decision-making about conservation and management.

www.nature.com/scientificreports/ to archaeologists. According to the local archaeologists, the mountainous terrain in Wenzhou was complex and rugged, with tight vegetation coverage. In most cases, they could not rely on transportation tools, and the only way to reach the mountaintops was by walking. In extreme weather conditions, there are also many potential dangers. Even after overcoming those physical difficulties, there was no guarantee that new beacon sites would be found in each survey, which greatly reduced the efficiency of archaeological work and was very detrimental to conservation work. Therefore, we intend to simulate a predictive method for the distribution probability of mountain beacon sites to solve these difficulties, reduce the research workload, and improve the efficiency of the site survey. Site probability prediction based on the assumption that the distribution of sites was not random but was related to specific characteristics of the natural environment and factors associated with human activity and norms of human behavior in the past 4 . Over the years, the most commonly used method of site probability prediction has been to build a mathematical model in the research area and calculate the probability values that the site may exist in an unknown area by analyzing the data of known samples 5 . This research originated in the 1950s, when Willey (1953) first applied the predictive model to the regional settlements of Weilu Valley, providing insight into the relationship between site and environment through qualitative analysis 6 . In the 1970s, quantitative analysis emerged, and Green (1973) used multiple linear regression to predict the probability of the existence of prehistoric Mayan sites in Northern British Honduras, laying the groundwork for the subsequent application of site predictive models to cultural resource management 7 . Thanks to the development of computer science, the two disciplines of statistics and spatial analysis skillfully combined in the 1980s led to great advances in site predictive research. Kvamme (1983) was the main advocate of this technique, and he proposed an approach for regional modeling of archaeological site locations based on computer processing techniques applied to digital elevation data 8 . Subsequently, books written by Kohler (1988), Judge & Sebastian (1988) and Kvamme (1988) were published, which were highly influential at that time [9][10][11] . Entering the 1990s, with the emergence of relevant computer software and the continuous promotion of Geographic Information Systems (GIS) in the field of archaeological research, the research methodologies gradually diversified. Märker & Heydari-Guran (2009) used data-mining techniques to predict Paleolithic site locations in the Zagros Mountains of Iran 12 . Casarotto, et al. (2011) proposed a predictive model to simulate an ancient landscape in the Eastern Lessinia by exploiting GIS suitability maps and multi-criteria target analysis 13 . Dorothy Graves (2011) developed a site predictive modelling using Logistic Regression method to target Neolithic settlement and occupation activity in mainland Scotland 14 . Scientific and sophisticated predictive means had vigorously developed site predictive research in terms of both depth and breadth [15][16][17][18] .
Constructing a site predictive model is a complex task that requires effective methodology, appropriate parameters, and a scientific approach of validation 19 . Statistical regularities and distribution characteristics of known sites are identified first based on the analysis of environmental factors in the given area, such as altitude, slope, distance from transport system or hydrographic net; then, a multivariate discriminant function is adopted to assess the existence probability of other sites in this area, giving a potential site distribution map 20 . The application of quantitative analysis makes the study of human-land relationships more accurate and scientific, among which the most widely used is the logistic regression analysis method 21 . This approach is frequently conducted as a useful tool for archaeological investigations, not only providing an important decision support system to obtain usable information but also saving time and money, especially in large areas 21 . Based on the previous research, this study will construct a precise predictive model applicable to the mountain beacon sites by identifying the causal relationship between certain environmental characteristics, human activities and known sites.

Materials and methods
Study area. The Wenzhou (温州) area was chosen as the study area, including the inner city of Wenzhou, as well as the counties and villages under its jurisdiction, with a total area of approximately 12,110 km 222 . Wenzhou is located in East China, southeast of Zhejiang Province, lower reaches of the Ou River, bordering the East China Sea to the east and Fujian Province to the south, which is one of the nationally famous historical and cultural cities (Fig. 1a). Being a coastal city, Wenzhou had been a key place for military defense throughout its history. Especially in the Ming Dynasty, Wenzhou was heavily infested by Japanese pirates with an extremely serious situation, so 58 beacons were built in Yueqing County, Rui'an County and Pingyang County along the coast of Wenzhou to prevent and resist enemies and were increased in the late Ming and Qing Dynasties 23 . By far, archaeologists have discovered 68 beacon sites in the Wenzhou area (Fig. 1b), most of which have fallen into disrepair. The materials used to build the beacon towers were all taken from nature, mostly made of clay and gravel, and rammed layer by layer; some were piled up with bricks and stones, with square, rectangular and round shapes. These beacon sites are historical evidence of Zhejiang coastal people's bravery in fighting against Japanese pirates. Based on the known data, this study will simulate a predictive method applicable to Wenzhou's mountain beacon sites.
Methodology. The site selection of mountain beacon towers was not random, but was a result of logical decisions based on geographical considerations and factors related to human activity. Most of them were found on the mountaintops along the traffic arteries between the forts or on the high hills of the coastal peninsula, where were easy to look out. Moreover, due to the requirement of visual factors, beacon towers were placed to ensure the accessibility of sightlines. Considering the two factors of geography and vision, we proposed to adopt the superimposed method of logistic regression and GIS viewshed analysis to simulate beacon sites' location preferences. Theoretically, this method can effectively improve the prediction accuracy and further narrow the possible scope of unknown beacon sites compared to the single logistic regression predictive method commonly used in the past. The working flow pursued the following steps ( Data sources and extraction. The data used to build the predictive model of mountain beacon sites consisted of two groups: the experimental samples (dependent variables) and the environmental factors (independent variables). First, 168 experimental samples in the survey area were required to study the probability of the dependent variables, including 68 sites and 100 nonsites. The number of experimental samples depends on the size of the study area, usually making the spatial distribution density of sites and nonsites equal. The data of beacon sites were collected from the survey data of the Wenzhou Institute of Cultural Relics and Archaeology (Fig. 1b), and the nonsites were randomly introduced specifically for the predictive model. Conceptually, all the locations where sites were not found can be used as nonsites, so the "Create Random Points" tool of ArcGIS software was used to generate 100 random points as nonsites of the experimental samples. In this study, the experimental samples consisted of 68 discovered beacon sites, and 100 randomly acquired nonsites. The independent variables related to the construction of the predictive model required to be screened out. For the independent variables, natural and cultural factors such as altitude, slope, distance, etc., that potentially influenced the spatial distribution of the sites, were integrated into the predictive model. The selection of independent variables was mainly determined by the mode and features of beacon transmission during the Ming and Qing dynasties, combined with the distribution characteristics of existing sites.
Wenzhou is mostly mountainous, and the terrain generally shows the features of high inland and low coastal areas; hence, the topographical parameters were mainly considered for natural variables. The survey data showed that the discovered beacon sites were mainly located in the coastal area and on the top of the hill with a good view to facilitate the information transmission among beacon towers. In addition, considering the delivery of living supplies and smoke-burning materials, the locational selection of the beacon towers must also consider the easy access. Therefore, the altitude, slope, aspect and topographic relief, that may be linked to their locations, were taken as alternative independent variables for further screening.
Except for topographical factors, the distribution of beacon sites was also probably influenced by cultural factors related to human activity. For example, because the beacon tower relied on smoke to transmit, only the neighboring beacon tower or fort was set within its viewing range so that the information could be transmitted one by one until the soldiers of forts received it, so the distance from nearest beacon tower or fort was inevitably under consideration when constructing the beacon system. Similarly, due to the need for convenient access to materials and resources, the distance between beacon towers and main traffic routes might also be one of the factors to be measured. Therefore, the cultural variables considered in this study mainly included two variables, namely, the distance from traffic routes and the distance from the nearest beacon tower or fort.
In general, six alternative independent variables were selected in this study for further screening, and the relevant data could be extracted using GIS spatial analysis tools (Table 1). Variables related to natural factors, including altitude, slope, aspect and topographic relief, were extracted from the Digital Elevation Model (DEM) data of the Wenzhou area, provided by the Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences (http:// www. gsclo ud. cn); variables linked to cultural factors, including the distance  www.nature.com/scientificreports/ from traffic routes and nearest beacon tower or fort, were derived from the historical literature 23 . By superimposing the vector data of traffic routes, beacon towers and forts data with DEM data, extracting through GIS analysis tools, the raster data of each alternative independent variable required in this study were obtained (Fig. 3).

Independent variables screening.
In the second stage, the alternative independent variables were screened by statistically analyzing the degree of correlation between beacon sites, nonsites and each alternative environmental variable, and those that are significantly correlated with the locational choice of beacon sites, were considered as the independent variables. For the five factors of altitude, slope, aspect, topographic relief and distance from traffic routes, superimposing the raster data with the distribution map, the attribute values of all the 68 beacon sites and 100 nonsites were extracted using the "Extract Multi Values to Points" tool of ArcGIS software. The distance from the nearest beacon tower or fort was calculated by the "Near Analysis" tool of ArcGIS software (Supplementary Table S1 and S2). The results obtained were analyzed by Bulked Segregant Analysis method 25 . If the data of an alternative independent variable showed obvious aggregation at the sites and random or opposite distribution at the nonsites, the factor was significantly associated with the locational choice of beacon sites and was screened as the independent variable of the study.
Logistic regression and modeling. After identifying the dependent and independent variables, logistic regression analysis was carried out with the Statistical Product and Service Solutions (SPSS) software. Logistic regression is a nonlinear categorical statistical method used in regression analysis of qualitative variables, and later popularized and widely used in population statistics and prediction 26 . According to the differences in dependent variables, logistic regression can be divided into binary logistic regression and multinomial logistic regression. Binary logistic regression, in which the dependent variable can only take values 0 and 1, has the advantage of dealing with multiple types of independent variables such as classification, sequencing and distance fixing at the same time, which is suitable for analyzing complex variables and the interactions among them 27 . Multinomial logistic regression has three or more dependent variables, which can be regarded as several binary logistic regressions and is suitable for complex cases with multiple dependent variables 28 . In this study, the dependent variables were only nonsites and sites, i.e. 0 and 1, so the binary logistic regression method was adopted to quantitatively analyze the locational preferences of beacon sites in the Wenzhou area.
According to the rule of binary logistic regression modeling, the existing probabilities of beacon sites were translated using a sigmoid cost function to values in the range of [0, 1], as presented in Eq. (1) 29 .
where P is the probabilities of the existence of beacon sites; z is the linear combination, obtained from Eq. (2).
where α is a constant term, indicating the natural logarithm of the ratio when the independent variable takes the value of 0; x n refers to the independent variables affecting the distribution of sites; and β n is the regression coefficient of the logistic regression.
However, when the number of alternative independent variables is large, the variables may have a high correlativity, and some of them may not have a significant effect on the dependent variable. Hence, the target variables need to be screened for significance in the logistic regression analysis. The "Backward: LR" algorithm of SPSS software can automatically screen and eliminate variables with low significance relationships with the model and carry out iterative calculations to arrive at the ideal model parameters.
With that, a predictive model of the beacon sites was generated using the Map Algebra operation. Map algebra is a high level spatial modeling language that conducts mathematical operations on multiple elements of spatial scope 30 , and can be conducted by the "Map Algebra" tool in the ArcGIS software. Through Eq. (1), a locational probability map of the beacon sites in the Wenzhou area was obtained.

Model validation.
After establishing the predictive model, the validity and accuracy of the predictive model was evaluated. To meet this requirement, this study used the cross-validation method to evaluate the accuracy of the predictive model. Cross-validation is a statistical method based on the idea of data slicing, in which the whole dataset is split into several datasets according to the actual needs 31 . In this study, we divided the beacon site dataset into two groups, one of which was used as the training sample and the remaining as the validation sample. To ensure the spatial distribution density of training samples and nonsites equal, we selected 85% of sites as training samples in this study, that is, 58 sites were taken as training samples, and the remaining 10 sites were used as validation samples. Additionally, considering the homogeneity of the samples and further calculations, the sites in the uniformly distributed areas were selected as validation samples.
Viewshed analysis. Having confirmed the validity of the predictive model, a further analysis was then performed on this basis to precisely determine the prediction range. Due to the influence of visual factor, the beacon towers must be set up within the effective visual range of each other, so as to form one or more coherent information flows and serve as a qualified enemy alerting system. Thus, the beacon towers must be in places that not only meet the requirements of terrain and resources but also ensure the safety and no obstruction of the sightline. Based on that, the "Viewshed" tool of ArcGIS software was used for superimposed analysis. The fundamental principle is as follows: assuming there are beacons A and B in a certain area and the sightline is obstructed, it can be concluded that there was a beacon C between A and B that has not yet been discovered. Then, by calculating

Results and discussion
The calculations and analyses were conducted sequentially following the steps in Fig. 2. The statistical results of independent variables screening are shown in Table 2 and Fig. 4. Results showed that the beacon sites locations were preferred in terrain areas with altitude 0-300 m, slope 0-15°, and topographic relief 0-300 m, whereas showing no obvious preference for aspect; the distances from traffic routes are relatively close, mostly reachable within 10,000 m; and the distribution of beacon sites is relatively concentrated, mostly within 6000 m apart. While nonsites locations showed a random or opposite distribution pattern compared with sites. This suggested that the locations of beacon sites presented a greater correlation with altitude, slope, topographic relief, distance from traffic routes and nearest beacon tower or fort, while there was no obvious correlation with aspect. Therefore, in this study, altitude, slope, topographic relief, distance from traffic routes and distance from nearest beacon tower or fort were taken as independent variables in the predictive model of beacon sites.
With the aim of verifying the accuracy of the predictive model, this study divided beacon sites into two groups, ten sites for validation samples and 58 sites for experimental samples, as detailed in "Model validation". Extracting data on altitude (X1), slope (X2), topographic relief (X3), distance from traffic routes (X4) and distance from nearest beacon tower or fort (X5) of the 58 beacon sites and 100 nonsites in the experimental samples (Supplementary Table S3), a binary logistic regression analysis was conducted ( Table 3).
The calculation result indicated that although slope was one of the influencing factors of beacon tower distribution, the significance was not high in the model. Therefore, the final predictive model contained four independent variables, namely, altitude (X1), topographic relief (X3), distance from traffic routes (X4) and distance from nearest beacon tower or fort (X5).
The final locational probability map of the beacon sites in the Wenzhou area generated using Eq. (1) and it was classified into three probability classes: low (0-0.75), high (0.75-0.95) and very high (0.95-1) (Fig. 5). The extraction results of the probability values corresponding to the 10 validation samples, are shown in Table 4; the probability values corresponding to the 100 nonsites are shown in Supplementary Table S4. The data showed that all the 10 validation samples with probability values above 0.95, which were in the very high probability zone, indicated that the predictive model of beacon sites established in this study had a high efficiency and could be used for further research.
Next, the viewshed analysis was conducted. Since the distribution of beacon towers in the northern area of Wenzhou was relatively simple, this study selected the beacon towers within the jurisdiction of Jinxiang Wei in the southern area as a case for analysis (Fig. 6). Jinxiang Wei was the highest-ranking fort in the southern area of Wenzhou, and the military mobilization and war command of other forts within this area were under its administration, which was the final collection point for all kinds of military information. There were seven known experimental beacon sites, two validation sites and one fort site in the study area. Taking each site point as the observation point, the visible conditions among the sites were calculated by viewshed analysis, and the statistical results are shown in Table 5.
Through the preliminary viewshed analysis, it was found that the distance between the two beacon towers connected by sight is within 4 km, transmitting information to Jinxiang Wei from the southwest and southeast directions separately, which created a precise intelligence network. Among them, the Beishan beacon was within the visual area of Jinxiang Wei, and the Fenghuang beacon could also indirectly transmit information to Jinxiang Wei via the Beishan beacon. However, the transmission route between the Dayandun beacon and Jinxiang Wei and the transmission routes between the Lingtou beacon, Nanshan beacon and Beishan beacon were blocked so that the information from the Xiaoyandun beacon and Ma'anshan beacon could not be transmitted to Jinxiang Wei. Therefore, it was judged that other beacon towers that were not yet found might exist in areas A and B (Fig. 7).
The viewshed areas of Jinxiang Wei and Dayandun Beacon (Fig. 8a) and the viewshed areas of Nanshan Beacon, Lingtou Beacon and Beishan Beacon (Fig. 8b) were superimposed to obtain the possible existence areas of beacon sites in terms of visual factors.
By superimposing the viewshed analysis result on the distribution probability map of the beacon sites (Fig. 6), the final prediction result of beacon sites in the Jinxiang Wei area was obtained (Fig. 9a). To verify the accuracy of the final prediction result, we projected the validation samples of beacon sites onto the final probability map (Fig. 9b), and it could be seen that the actual beacon site locations basically matched the final prediction result, which demonstrated the practicability of the predictive method constructed in this study.
In addition, the following conclusions can be drawn from this study: (1) by comparing the final prediction result obtained after superimposing viewshed analysis (Fig. 9b) with the single logistic regression prediction result (Fig. 6), it was found that the predictive scope is reduced by more than 90%, which greatly improved the accuracy and predictive ability of site prediction and can provide decision-making guidance for archaeological excavation work to a certain extent, thus effectively saving manpower and material resources, reducing the blindness of archaeological work, and yielding twice the result with half the effort; and (2) during the research process, the independent variables that have a significant influence on the establishment of the predictive model can be screened out through data extraction and statistical analysis, and all those variables have been regarded as important environmental factors that influence ancient humans' residential choices by researchers in archaeological research. The results of GIS spatial analysis and logistic regression analysis suggested that the locational selection of beacon sites in Wenzhou was significantly influenced by traffic routes, followed by topographic relief and altitude, while slope did not have strong effects on site distribution. Moreover, the rules of siting for beacon Scientific Reports | (2022) 12:10773 | https://doi.org/10.1038/s41598-022-15067-z www.nature.com/scientificreports/ towers in ancient times could also be inferred: the ancient people attached great importance to building beacon towers in areas with wide vision and small topographic relief, not only could get a good view, easy to look out but also convenient for construction; additionally, they also preferred locations near transportation system, where enemy activities were frequent, to monitor their movements and to quickly transmit alerts to military forts for timely decision making and counterattack organization. www.nature.com/scientificreports/ Of course, the shortcomings of the method used in this study must be recognized: (1) the independent variables selected in this study are limited. The site selection of beacon towers might be closely related to military, politics, Fengshui, climate and other factors, which are difficult to extract quantitatively, so they are not considered influencing factors in this study; (2) there are some constraints in data acquisition. The research object was in ancient times, but ancient data could not be obtained, so modern data were used as a substitute; moreover, there were some human errors in data vectorization. If it is possible to obtain more accurate and reliable data, the prediction result will be more scientific and can further improve the reliability of the research; (3) the nonsites in the experimental samples come from the random points generated by ArcGIS software, among which are likely to be beacon sites waiting for excavation. When they act as nonsites in the predictive model, it will have a weak impact on the accuracy of the final prediction results. In addition, there might be some inevitable subjective factors when classifying the training and validation samples; and (4) there are certain limitations of the viewshed analysis method, which is only applicable to the case in which the beginning and ending beacon sites of a     Nevertheless, the use of GIS and statistical methods has transformed archaeological research from qualitative to quantitative analysis, and the quantification of data analysis has provided new perspectives for settlement archaeological research. The predictive method of Wenzhou's beacon sites proposed in this study has a huge breakthrough in prediction accuracy and ability compared with previous predictive methods, which can narrow the possible area of the beacon sites to a smaller scope and establish a predictive model with more practical predictive ability.

Conclusions
In this study, 68 discovered beacon sites and 100 random nonsites in the Wenzhou area were taken as experimental samples, which were divided into two parts: training samples and validation samples. Data extraction, statistical analysis and screening of each independent variable were carried out using GIS spatial analysis tools, and the distribution predictive model of beacon sites was established through logistic regression. On this basis, the viewshed analysis of the beacon sites was conducted according to their visual attributes to precisely determine the predictive scope, improve the accuracy and predictive ability of traditional prediction methods, and reduce the difficulty of archaeological work to the greatest extent possible. The results showed that the predictive method of beacon sites proposed in this study reduced the possible location scope by more than 90% and was proven to be scientific and reliable.
From the perspective of historical research, the predictive model can also be used to investigate the relationship between the beacon sites and various natural and cultural environmental parameters, revealing the deployment characteristics of mountain beacon towers and deepening the understanding of military heritage. In addition, for heritage management, the site predictive model has great potential for application. China has numerous cultural heritages, and even though three national relic surveys have been conducted, many sites are still missing. In the face of frequent infrastructure construction, finding ways to improve efficiency, save costs,  www.nature.com/scientificreports/ and protect the sites, that have not yet been discovered and registered, is a serious problem. The site predictive model can be used to predict the probability of site discovery in unknown locations and assist in protection and management decisions. Moreover, regional systematic investigation for settlement morphology research has been widely carried out nationwide, which provides the basic conditions for the establishment of archaeological site predictive models in various places. Therefore, the application and promotion of archaeological predictive models in heritage conservation and infrastructure construction are not only necessary but also practical.