Effects of land use/cover on surface water pollution based on remote sensing and 3D-EEM fluorescence data in the Jinghe Oasis

The key problem in the reasonable management of water is identifying the effective radius of surface water pollution. Remote sensing and three-dimensional fluorescence technologies were used to evaluate the effects of land use/cover on surface water pollution. The PARAFAC model and self-organizing map (SOM) neural network model were selected for this study. The results showed that four fluorescence components, microbial humic-like (C1), terrestrial humic-like organic (C2, C4), and protein-like organic (C3) substances, were successfully extracted by the PARAFAC factor analysis. Thirty water sampling points were selected to build 5 buffer zones. We found that the most significant relationships between land use and fluorescence components were within a 200 m buffer, and the maximum contributions to pollution were mainly from urban and salinized land sources. The clustering of land-use types and three-dimensional fluorescence peaks by the SOM neural network method demonstrated that the three-dimensional fluorescence peaks and land-use types could be grouped into 4 clusters. Principal factor analysis was selected to extract the two main fluorescence peaks from the four clustered fluorescence peaks; this study found that the relationships between salinized land, cropland and the fluorescence peaks of C1, W2, and W7 were significant by the stepwise multiple regression method.


Result and Analysis
Land use/cover characteristics at various spatial scales. To

PARAFAC model components from EEM.
Overall, all fluorescent EEM data were resolved into a successful PARAFAC model analysis. Figure 2 reveals each contour profile of the three PARAFAC components. Seven peaks were extracted, decomposed from water samples of the Jinghe Oasis, as shown in Table 1. However, W1 and W2 represent the peak of Raman scattering; therefore, they are not considered in further calculations. Figure 3 shows the differences in C1, T2, T3, T5 and T7 at the Jinghe Oasis. C1, T2, T3, T5 and T7 represent photodegradation products, humic-like substances, photodegradation products, humic substances + recent materials and tyrosine-like substances, respectively. The value of C1 ranges between 179.2 and 2884, and the value of W1 ranges between 1013 and 9998. The high values of C1 and W1 suggest the photodegradation product content is different in the watershed. The values of W2 were the highest in sample 17, ranging from 791 to 9991. The values of W5 were the highest in sample 34, sample 35 and sample 36, ranging from 291 to 8991.

Relationships between the fluorescence peak values and the land use and cover types.
At small scales, a positive correlation indicates the influence of built-up land on the peak value of fluorescence, showing that, with the increase in the composition of urban land, the concentrations of these water quality parameters increase and make the contamination more serious ( Table 2). It should be especially noted that C1 and W7 are influenced differently by urban land at 200 m scales, with R 2 of 0.95. Petroleum is not affected by the composition of each land use type. C1 and W7 are influenced by salinized land at all scales, and the R 2 increases as the scale increases. The R 2 is 0.58 for the 200 m radius of operation. Therefore, relationships between the peak value of fluorescence and the land use/cover were explored within a 200 m radius of operation.
The contribution of land use/cover to water quality pollution in a 200 m radius. Spatial framework of land use/cover in a 200 m radius. With regard to network structure selection, the neural network with a more complicated structure will generally have a better capability to address complicated non-linear problems. However, more complicated neural networks require a longer training time. Using a greater land use and cover type area can provide more abundant information; however, the correlation among indices will increase. The topological values were selected to determine grid size in this study, and the k-means clustering method was adopted to obtain the results. Overall, after the standard processing of the water quality data, the best network training effect was obtained from 36 (6 × 6) nerve cells (Figure 4), with QE and TE values of 1.033 and 0.001, respectively.  show that the distribution of the land use/cover type area varies in different clustering layers. Among the six clusters, the land use/cover type area is generally relatively better in Cluster 4. However, Cluster 2 and has 5 sampling points, 11, 12, 24, 25 and 28, which contain a water body. In addition, the water body, forest-grass land, cropland, salinized land, desert and other areas are low in Cluster 1, which indicates relatively SCIEntIfIC RepoRts | (2018) 8:13099 | DOI:10.1038/s41598-018-31265-0 concentrated human activity. Cluster 1 has 5 sampling points, 1, 17, 8, 19, 29, and 30. Cluster 3 has 6 sampling points, 2, 3, 4, 5, 6 and 7. Cluster 4 has 10 sampling points, 1, 17, 15, 16, 9, 10, 14, 18, 26 and 29. Figure 5 shows the relations among the distribution of different variable classifications and water quality parameters. The land use and cover pattern and distribution of sampling points in Cluster1 are as follows: the town of TuoTuo is west of the sampling points, point 8 is in the middle of Jing River, points 19, 21, 22 and 23 are in the middle of Bortala River, and points 29 and 30 are in the eastern part of Bortala River. There is a large urban area in this region, so the cluster is therefore defined as urban land-oriented. The main samples of Cluster 2 were from the Bortala River and Jing River estuary, where forest-grassland and salinized land were the main land use types; thus, this cluster is defined as having grassland-and salinized land-oriented land use patterns. The rest of the samples were located near the farmland areas and were defined by a cropland-oriented utilization pattern. To further visualize the four components, the results are shown in Figure 6.
Spatial framework of the peak fluorescence value in the 200 m radius. Regarding network structure selection, the neural network with a more complicated structure will generally have better capability to address complicated non-linear problems. However, more complex neural networks require a longer training time. Using a greater peak value of fluorescence can provide more abundant information; however, the correlation among indices will increase. The topological values were selected to determine the grid size in this study, and the k-means clustering method was adopted to obtain results. Overall, after the standard processing of the water quality data, the best network training effect was obtained from 36 (6 × 6) nerve cells (Figure 7), and the QE and TE values were 1.043 and 0.001, respectively.  show that the distribution of the peak fluorescence value varies in different clustering layers. Among the four clusters, the peak fluorescence value distribution is generally relatively better in cluster 4. However, Cluster 2 only has 4 sampling points, 11, 12, 24, 25 and 28, which contain a water body. Cluster 1 has 5 sampling points, 1, 17, 8, 19, 29, and 30. Cluster 3 has 6 sampling points, 2, 3, 4, 5, 6 and 7. Cluster 4 has 10 sampling points, 1, 17, 15, 16, 9, 10, 14, 18, 26 and 29.  Figure 8 shows the distribution relation among the various variables and water quality parameters of different clusters. For example, C1, W2 and W7 are recorded in the right corner of the SOM network, thereby indicating a declining trend in the southern part of Ebinur Lake and the surrounding Kuitun River. The main samples of cluster 2 were from the Bortala River and the Jing River, which flow into the lake. Cluster 3 is mainly a class of six samples from the Akeqisu River and the Kuitun River. The rest of cluster 4 is located near farmland. To further visualize the four components, the results are shown in Figure 9.
Principal components of fluorescence spectra under different clusters. Data redundancy in fluorescence data is often important for spectral analysis. The principal component analysis method was used to reduce the dimension of data in this study. Three principal components were selected that accounted for a cumulative variance greater than 85%. The main components of the band and the main score matrix are shown in Table 3.
The contribution of land use/cover to water quality pollution in a 200 m radius. Using the multiple regression method, the contribution of land use/cover type to the fluorescence peak of the water body area was analyzed. R 2 and F values were used to test their correlations, as shown in Table 4. At cluster 1, the regression analysis between the land use/cover area and the principal components of the fluorescence spectra shown that there is no significant relationship between W7, W2 and the land type; similarly, there is no significant relationship between C1 and the corresponding clustering component of the land use, the contribution of salinized land and forest-grassland types (R 2 = 0.91). At cluster 2, the regression analysis between the land use/cover area and the principal components of the fluorescence spectra shows there is a significant relationship between W2 and the urban land and cropland, with R 2 of 0.49 and 0.67, respectively. At cluster 3, the regression analysis between the land use/cover area and the principal components of the fluorescence spectra shows that there is no significant relationship between W2 and the desert or salinized land, with R 2 of 0.69. At cluster 4, the regression analysis between the land use/cover area and the principal components of the fluorescence spectra shows there is no significant relationship between W2 and the salinized land and "other" area (R 2 is 0.66). Salinized and construction land types are major contributors to the dissolved organic matter affecting the surface water quality in the Jinghe Oasis.

Discussion
The water environment is reflected by the fluorescence peak. Intensive land use in river watersheds and the rapid response of organic pollutants from different sources may cause the substantial deterioration of water quality, posing a direct or indirect threat to the quality of life of local people and the health of aquatic ecosystems 14,20 . Excitation-emission matrix (EEM) spectroscopy can be used to interpret a wide range of excitation and emission wavelengths contained within a variety of fluorescing water samples. In excitation-emission  fluorescence, there is a relationship between fluorescence peaks and the number of water quality parameter 21 .
The fluorescence peak of a water body reflects its environmental conditions. Research by Wang et al., indicates that 3D-fluorescence techniques are capable of estimating and monitoring surface water pollution in the Jinghe Oasis 14 . Fluorescence spectroscopy has received much attention in recent years due to its potential application in monitoring the water of rivers and lakes. This technique is attractive for monitoring the water quality in inland water bodies, as it is a rapid technique that requires no reagents and no sample preparation for analysis. The relationship between water quality and land use/cover is equivalent to the relationship between the peak of    fluorescence and land use/cover. Kiedrzyńska et al. found that cropland areas were found to influence nitrogen, and forests areas were negatively related to loads of both nitrogen and phosphorus 19 , important organic matter factors in this study. Therefore, the results of this study are consistent with previous results.
The influence of spatial and temporal scales. The influence of land use/cover patterns on water quality fluorescence is scale dependent 22 . The results showed that the 200 m action radius was better than the other action radii in explaining the overall fluorescence variations. The 200 m action radius acts as a filter by reducing surface   runoff, processing nutrients to improve the water quality of rivers [23][24][25][26] . The radius of action was first introduced to explore the effect of land use on water quality in arid areas. Although it is essentially a case of buffer analysis, it also provides a new way of thinking about the effect of land use on surface water quality. The radius of action    is based on the analysis of geospatial data, which lacks a strong theoretical basis, the primary problem to be addressed by future research.

Management suggestions.
The urban areas and salinized land in this watershed were mainly dispersed along the river, which had a negative effect on the river water quality. Consistent with a typical continental climate, this region is extremely dry and windy, and it has little rainfall and frequent dust storms in the Jinghe Oasis. Therefore, it is important to control urban runoff and ensure that the water quality meets national standards. Salinized land had a strong impact on water quality at a large scale. Reasonable irrigation and soil salt improvements are important measures. Forest-grassland areas have strong contributions to water quality variations at the river scale. Landscape pattern planning should be used to improve the water quality of watersheds in the arid region of central Asia.

Conclusion
In the scope of applied conservation, understanding the impact of the surrounding land use/cover and human activities on the water quality at multiple scales is essential to adapt scale appropriate strategies to protect and rehabilitate in basin scale. The influence of the landscape on the water quality is scale dependent; this scale is beneficial to the management of water quality, convenience, economy and safety. Therefore, the key problem of the effective management of surface water is identifying the effective radius of the surface water pollution and blocking the pollution source. The Jinghe Oasis is located in the China-Kazakhstan border in the Xinjiang Uyghur Autonomous Region of China; we demonstrated the potential of integrated remote sensing and three-dimensional fluorescence technologies to investigate the effect of land use/cover types on surface water. The PARAFAC model and self-organizing map (SOM) neural network model were used to determine the effects of land use/cover types on water quality.
(1) The four fluorescence components that were successfully extracted by the PARAFAC factor analysis modeling from the fluorescence EEM data are as follows: microbial humic-like (C1), terrestrial humic-like organic substances (C2, C4), and protein-like organic substances (C3). (2) Taking 30 water sampling points to build 5 buffer zones (100 m, 200 m, 300 m, 400 m, and 500 m), we found that "the most significant relationship between land use type and fluorescence components is found with 200 m radios, and the maximum contribution is from buildup land and salinized land. (3) Typical three-dimensional fluorescence peak and land use type were classified by the SOM neural network method, which demonstrated that four different types exist between three-dimensional fluorescence peak and land use types. (4) A principal factor analysis method applied to four fluorescence peaks and the stepwise multiple regression method showed that the clustering type contributes mostly to the surface water organic pollution in Ebinur lake were salinized land and cropland land, which was the contributed source of C1, W2, W7 fluorescence peak; salinized land was the most contributed source of W2, W3 fluorescence peak.
The relationship between the land use/cover and the water quality is complicated and can be influenced by numerous factors. From the perspective of the landscape, the design of effective radius of the surface water pollution is not simply related to the land use types, but it also depends on the spatial structure of the land use types. Therefore, the effective radius can better solve the water quality management in the watershed.

Study area. The Jinghe Oasis is located in the center of Eurasia in the northwest Xinjiang Uygur Autonomous
Region at 44°02′∼45°10′N and 81°46′∼83°51′E. The Jinghe Oasis is composed of wetland and desert oasis vegetation and wildlife and is a national desert ecological reserve. The study area has a unique wetland ecological environment, and it has been listed as the Xinjiang Uygur Autonomous Region "Wetland Nature Reserve" (Figure 10). The Jinghe Oasis was once fed by 12 branch rivers belonging to three major river systems, and the major rivers were the Bortala River (B-R), Jinghe River (J-R), Aqikesu River (AQKS-R) and Kuitun River (K-R). Owing to natural environmental changes and human activities (i.e., modern oasis agricultural development), many rivers gradually lost their hydraulic connections with Ebinur Lake, and only the Bortala River and Jinghe River now supply water to Ebinur Lake. The western region of the Bortala River (B-R) valley is south of the Jing River (J-R) oases and the Dandagai Desert and east of the Mutetaer desert zone of the lower reaches of the Akeqisu-Kuitun River (AKQS-R). Ebinur Lake is located in the center of a watershed at the lowest elevation and represents a typical lake of the arid areas of Central Asia. The total watershed area is 50,621 km 2 . It is surrounded by a mountainous region (24,317 km 2 ; Alatau Mountains) and plain areas (26,304 km 2 ) to the north, west and south 27,28 . The climate is a typical temperate arid continental climate, and the mountain-oasis-desert system has typical temperate arid ecological characteristics. The study region is located inland (2000 km away from the Pacific and Indian Ocean and 3000 km away from the Arctic Ocean); the moisture sources in the study area are derived from the Atlantic Ocean (7000 km), but overall, there is limited water vapor transport from maritime areas 29 .
Data sources and Data processing. Water sample acquisition and processing. Water samples were collected on 5 July 2016 from 29 locations within the Jinghe Oasis, a typical arid oasis. The collected samples were kept in low-temperature cold storage (under 2 °C) during transport before the water quality measurements were carried out in the laboratory. Samples were transported in polyethylene plastic bottles, previously washed in 10% HCI and cleaned with deionized water, to minimize changes in the water chemical characteristics. The collected samples were filtered using a pre-washed GF/F filter. All fluorescence intensities were determined using a Cary Eclipse Fluorescence Spectrophotometer (F-7000, Hitachi High-Technology Corp, Tokyo, Japan). The spectrophotometer that measured the fluorescence spectra was equipped with a 150 W xenon arc lamp as the light source and two grating monochromators coupled with a slit as the EEM wavelength selectors. The scanning speed was set to 60000 nm/min. Therefore, each measurement period was 2 minutes in duration. The EEMs were measured every 10 nm over an excitation range of 200-450 nm, with an emission range of 200-550 nm by 10 nm. MilliQ water EEMs were used as blanks and subtracted from each sample EEM. The emission and excitation correction files generated by the FluoroMax manufacturer were applied to each MilliQ-subtracted sample EEM. Fluorescence intensities were standardized to a Raman peak at 395 nm emission, as suggested by Lawaetz 30 .
Remote sensing data. Medium spatial resolution cloud-free GF-1 images were used in this study (Table 5) and were acquired near the actual water sampling date on April 12th, 2016. These data were obtained from CRESDA (China Resources Satellite Application Center, http://www.cresda.com/CN/). The images were acquired in clear and dry weather conditions during the dry season. These GF-1 satellite images have a ground sampling distance (GSD) of 16 m and a pan band GSD of 8 m. The GF-1 images contain 5 bands that record the reflected or emitted radiation from the Earth's surface in the B, G, R NIR and pan bands of the electromagnetic spectrum.
The sensor and atmosphere can cause changes to the spectral characteristics of a target in a multitemporal remote sensing image, which affects the extraction of image data 31 . Therefore, the remote sensing images must undergo an atmospheric correction. To accomplish this correction, the pre-processing of GF-1 images was first conducted using ENVI 5.1 (Environment for Visualizing Images 5.1) (ExelisVisual Information Solution Corporation, America) software. Universal Transverse Mercator (UTM) projection was selected to rectify the satellite images. The GF-1 images were also geometrically corrected with a previously corrected GF-1 image that had a geometric accuracy of <0.5 pixels 32 . Then, the ENVI 5.1 radiometric calibration tool and the gain and deviation ratio in the GF-1 image data head document were used for the radiometric calibration of GF-1 data. Finally, the ENVI5.1 FLAASH atmospheric correction model was used for the atmospheric correction of the remote sensing images.   Methods. The methodology is explained in the following section, with a conceptual flow chart describing the methodology. Figure S1 shows the workflow of the study detailed in the following sections.
Data Fusion. Data fusion is useful because it takes advantage of different spectral and/or resolution information for effective image interpretation. Pan bands and four multispectral band images were used for the image fusion 33 . We implemented selective principal component analysis (S-PCA) transformation, rather than the conventional standard PCA method.
Land use land cover clustering based on decision tree classification. The decision tree (DT) classifier is a simple and widely used classification technique. The DT classifier is an effective method to incorporate a variety of data types from multiple sources to find pixels that fulfill the criteria [34][35][36] . We conducted radiation and orthographic corrections for the remote sensing image data combined with 1:50,000 digital elevation model (DEM) data. We established five land use/cover types by using the Environment for Visualizing Images software (ENVI Version 5.0), urban land, cropland, forest-grassland, water body, salinized land, desert and others based on the actual conditions of the research zone. The final results showed that the producer's accuracy of the classified LCLUC maps was 85.29%. The user's accuracy of the classified LCLUC maps was 84.47%. The overall accuracy was 89%, and the kappa coefficient was 0.88.
PARAFAC Modeling. PARAFAC applications allow full use of the fluorescence EEM data samples. Fluorescence spectrum data multiplexer (three-way) as a sample of the fluorescent changes depends on the wavelength of light absorption (excitation) and fluorescence wavelength was observed (emission). PARAFAC decomposes the EEM dataset into a set of trilinear terms and a residual array 37 and fits an equation by minimizing the residual sum of squares of three linear models as follows: where x ijk is the intensity of the three-way data array for the ith sample at the emission wavelength j and excitation wavelength k; a if is directly proportional to the concentration of the fth three-way data array in the ith sample (defined as scores); b jf and C kf are the estimates of the three-way data; F represents the number of components in the model; and e ijk is the residual element, representing the variability not accounted for by the model 38,39 .
MATLAB 2014a (MathWorks, Natick, MA, USA) and the DOM Fluor toolbox (http://www.models.life.ku.dk) were selected to perform the complete PARAFAC modeling. Residual analysis, spectral scores at the core of the consistency and visual inspection of each group were selected to diagnose the correct number of components 37 . The fluorescence intensity of each component (I i ) was estimated using the following formula 40 : where Score s represents the ith relative fluorescence intensity; λ Ex ( max) i represents the first N components to stimulate the maximum load; and λ Em ( max) i represents the first n components of the emission load of the maximum number.
Analysis of water pollution radius. The "radius of action" was first introduced in the engineering field to discuss the quantity of blasting and the accurate blasting scope in the multiboundary blasting system. The term "radius of water quality action" was introduced to understand the effective range of land use in watersheds. To summarize the results of previous studies, we explored the effect of land use at a range of scales, 100 m, 200 m, 300 m, 400 m to 500 m, on the water quality in rivers. The framework is designed as shown in Figure S2.
Recognition of land use/cover and fluorescent component spatial characteristics based on the self-organizing map SOM method. A self-organizing map (SOM) is one of the branches of artificial neural network algorithms. It is a self-organizing and self-learning network visual method that can express multidimensional spatial data in low-dimensional points through non-linear mapping 41 . The SOM is an all-purpose classification tool that can connect samples with variables 42 . In recent years, the SOM has become increasingly popular in environmental research because of its capacity to address non-linear relations. The idea motivating the development of the SOM method was to represent a large amount of data typical of samples. The map usually has a 2D structure with a map unit associated with a weight vector.
where N ij is a 2D map grid (also called a neuron); W ij is the weight vector assigned to (i, j), the unit of SOM architecture; and L and M are number of rows and columns, respectively [43][44][45] . The steps of the SOM algorithm are displayed as follows: • Step 1: Data normalization and SOM network initialize, the weight vector w ij (i = 1, 2, ……, S; j = 1, 2, 3, ……, R) is randomly set in the interval [0, 1], R is the sample dimension, and S is the number of output neurons. • Step 4: The weight vector w ij in the neighbor ration is updated. • Step 5: The process proceeds in an iterative way until the optimal number of iteration steps is satisfied, and then it returns to step 2.
The SOM technique has a distinct capability to represent the complex relationships of the fluorescence intensity data and the land use area data using component planes and U-matrix. All simulations were implemented in MATLAB R2014a using an SOM toolbox.

Statistical Analyses.
A descriptive statistical analysis was applied to evaluate the water quality indices and EEM-PARAFAC and 3D Fluorescence spectral index. Linear regression and correlation analyses were constructed using Origin 8.0 (OriginLab Corporation, America). The significances of the correlations in the statistics were evaluated using P values and t values.