Reconstructing missing time-varying land subsidence data using back propagation neural network with principal component analysis

Land subsidence, a complex geophysical phenomenon, necessitates comprehensive time-varying data to understand regional subsidence patterns over time. This article focuses on the crucial task of reconstructing missing time-varying land subsidence data in the Choshui Delta, Taiwan. We propose a novel algorithm that leverages a multi-factorial perspective to accurately reconstruct the missing time-varying land subsidence data. By considering eight influential factors, our method seeks to capture the intricate interplay among these variables in the land subsidence process. Utilizing Principal Component Analysis (PCA), we ascertain the significance of these influencing factors and their principal components in relation to land subsidence. To reconstruct the absent time-dependent land subsidence data using PCA-derived principal components, we employ the backpropagation neural network. We illustrate the approach using data from three multi-layer compaction monitoring wells from 2008 to 2021 in a highly subsiding region within the study area. The proposed model is validated, and the resulting network is used to reconstruct the missing time-varying subsidence data. The accuracy of the reconstructed data is evaluated using metrics such as root mean square error and coefficient of determination. The results demonstrate the high accuracy of the proposed neural network model, which obviates the need for a sophisticated hydrogeological numerical model involving corresponding soil compaction parameters.


Study area
During the 1970s, researchers noted instances of subsidence along the southern coastal regions of the Choshui Delta located on the west coast of central Taiwan 8,12,13 .This phenomenon escalated in severity, resulting in detrimental effects on public infrastructure and various other issues.Although subsidence in coastal areas has witnessed a deceleration over the past decade, it persists in inland regions.Presently, within the entirety of the delta, the central zone is experiencing the most significant rate of subsidence.According to the Water Resources Agency (WRA) under the Ministry of Economic Affairs of Taiwan, Yunlin County in the Choshui Delta registered the highest annual subsidence rate of 7.9 cm in 2022-a peak across Taiwan, as illustrated in Fig. 1.
Notably, the most pronounced subsidence is prevalent in Tuku and Yuanchang Townships within the central Choshui Delta.Accordingly, the study area under investigation is the Choshui Delta, located in western Taiwan.The Choshui Delta encompasses an area of 2000 km 2 with elevations ranging from 0 to 100 m (Fig. 2).The primary river, the Choshui River, originates from the western part of the central mountain range, flowing between Due to excessive groundwater extraction, the central area of the Choshui Delta faces significant land subsidence issues 8,12,13 .Figure 3 illustrates the accumulated subsidence from 2011 to 2020 in the depth range of 0 to 60 m.As shown in Fig. 3, the inland regions of Yunlin County contain the most severe subsidence areas, specifically in Huwei Township, Tuku Township, and Yuanchang Township.Consequently, Multi-Layer Compaction

Datasets
The geographical location of the study area, which includes STES, YCES, and NLPS, is depicted in Fig. 4. In this study, several time-dependent factors, including groundwater level data, electricity consumption data, and precipitation data are recognized as influential factors in land subsidence.These factors fall within the category of extrinsic factors associated with human activities.Monthly fluctuations in groundwater levels and electricity consumption (a proxy indicator for estimating groundwater usage) are typically the major contributors to accelerated subsidence.Furthermore, precipitation may also play a crucial role in recharging groundwater, which in turn impacts subsidence.
However, it is undeniable that, in addition to the factors mentioned above, other variables such as land use patterns, sediment type, and drainage path length could potentially impact land subsidence.For instance, intensified agricultural activity may result in land subsidence, particularly in regions with extensive irrigation practices.Fine-grained soils may be susceptible to land subsidence when subjected to excessive groundwater extraction.Thus, factors such as the percentage of fine-grained soil and the length of the average maximum drainage path may be considered relevant factors influencing land subsidence.
Table 1 lists the source data utilized in this study.These datasets consist of the cumulative land subsidence data obtained from levelling surveys and MLCWs, groundwater level data, electricity consumption data, and precipitation data.The cumulative land subsidence data and groundwater level data are publicly accessible and sourced from the WRA, while electricity consumption data is also sourced from the WRA.Precipitation data is acquired from the Central Weather Bureau.The percentage of fine-grained soil and the length of the average maximum drainage path are derived from borehole logging data 30 , as shown in Fig. 5, provided by the Central Geological Survey (CGS) and WRA of Taiwan.The current state of land use, essential for calculating the

Monthly compaction change
Land subsidence can be primarily classified into three categories: subsidence resulting from groundwater extraction, subsidence triggered by the weight of structures, and subsidence caused by the natural consolidation of  www.nature.com/scientificreports/alluvial soil.Land subsidence datasets consist of the cumulative land subsidence data obtained from levelling surveys and MLCWs.Levelling surveys are a fundamental technique used in land surveying engineering to determine the relative elevations of different points on the Earth's surface.The MLCW technique is adopted to survey the compaction at different depth.The MLCW is a specialized monitoring instrument used to measure and assess land subsidence, particularly in areas where excessive groundwater extraction is a concern.The MLCW is designed with multiple sensors or observation points at different depths within the ground.These sensors record variations in the distance between them over time, allowing researchers to detect changes in the soil's compaction or compression at various depths.The primary purpose of MLCWs is to provide detailed and precise data on how land subsidence occurs at different layers beneath the surface.This information is crucial for understanding the subsidence process.MLCWs are valuable tools in regions prone to land subsidence, such as areas with excessive groundwater pumping or geological conditions that promote compaction of the subsurface materials.
The first MLCW of the subsidence network was carried out in 2008 and 31 MLCWs have been deployed in Choshui Delta since then 11,14 .The time varying subsidence data from the MLCW are crucial to precisely investigate the compression of the soil in spatial and temporal scale.The monitoring depth of the MLCW is ranging from 2.4 to 340 m.The variation observed between two neighboring rings depicts the deformation of the stratigraphic profile spanning between them.In the MLCW monitoring technique, rings refer to different sections or layers within the well that are instrumented to measure compaction at various depths.Each ring provides data on subsidence at a specific depth range.This information helps in understanding how subsidence varies with depth in the soil profile.The functioning of MLCWs involves measuring the compaction of these rings over time to monitor land subsidence.The compaction of each soil layer to the total subsidence is then measured.The MLCW has advantage of the monitoring subsidence with high accuracy of 1 mm 11 .The monthly compaction change is calculated as follows.
where C denotes the monthly compaction change, C i denotes the accumulated subsidence at the i-th month, and C i−1 denotes the accumulated subsidence at the (i-1)-th month.
In this study, the MLCWs installed at STES, YCES, and NLPS are adopted because these areas are situated at the highest subsidence area, as shown in Fig. 4. The plot of monthly compaction change versus year at STES, YCES, and NLPS is demonstrated in Fig. 6a,b,c, respectively.It is found that the subsidence data from the MLCWs installed at STES, YCES, and NLPS are not available from 2012 to 2014.The missing time varying subsidence data will be reconstructed using the neural network in this study.

Monthly groundwater level variation
Previous researches reveal that groundwater exploitation is the major factor inducing land subsidence 1,2,6 .Accordingly, the groundwater level records are selected as one of the input features.The well depth of the multi-layer groundwater level monitoring wells at STES, YCES, and NLPS is 134 m, 90 m, 189 m, respectively.The monthly groundwater level variation is calculated as follows.
where G denotes the monthly groundwater level variation, G i denotes the groundwater level at the i-th month, and G i−1 denotes the groundwater level at the (i-1)-th month.Figure 7a,b,c illustrate the plot of monthly ground- water level variation at STES, YCES, and NLPS, respectively.The groundwater level data show obvious seasonal changes in wet and dry seasons every year.

Monthly electricity consumption of managed wells
Land subsidence is a recognized consequence of excessive groundwater exploitation, making the investigation of groundwater usage a critical aspect of this study.However, data directly related to well discharge and groundwater usage are unavailable.Consequently, we conducted a correlation analysis to explore the relationship between pumping rate and the electricity consumption of managed wells.For this analysis, we focused on a total of 107 wells located within a 2500 m radius of the STES.It is found that within a 2500 m radius of the STES, the pumping rate and electricity consumption exhibit a high positive correlation, with a correlation coefficient of 0.97.This analysis demonstrates a strong positive association between pumping rate and electricity consumption.Accordingly, we employ electricity consumption by wells as a proxy indicator for estimating pumping rate, which in turn represents groundwater usage.
The monitored data of electricity consumption for 2017 to 2021 for the managed wells within the 250 m radius of each MLCW were analyzed.Data of the electricity consumption are collected from 39, 18, 27 managed wells at STES, YCES, and NLPS within the 250 m buffer region, respectively.Figure 8a,b,c illustrate the plot of total electricity consumption of managed wells for each month versus year at STES, YCES, and NLPS, respectively.Results of total electricity consumption of managed wells show obvious seasonal changes in wet and dry seasons every year.Based on the total electricity consumption of managed wells, the monthly electricity consumption variation of managed wells can be evaluated as where E is the monthly electricity consumption variation, E i is the electricity consumption at the i-th month, and E i−1 presents the electricity consumption at the (i-1)th month.The electricity consumptions of wells in the buffer region are composed of time series electricity consumption recorded on a selected managed wells distributed over the study area.

Monthly precipitation
Precipitation plays a pivotal role in influencing land subsidence.Positive values for precipitation variation indicate an increase in rainfall, which can contribute to higher groundwater recharge.This may lead to reduced land subsidence.Negative values for precipitation variation signify a decrease in rainfall, potentially resulting in less groundwater recharge and potentially more significant land subsidence.Accordingly, the monthly precipitation records is selected as one of the input features.The total monthly precipitation data are from the Central Weather Bureau.Figure 9a,b,c, illustrate the plot of total precipitation for each month versus year at STES, YCES, and NLPS, respectively.From June to September, www.nature.com/scientificreports/there is a concentration of rainfall, which represents around 80% of the total annual precipitation.The variation of average monthly precipitation was calculated as follows: where R denotes the variation of average monthly precipitation, R i denotes the average monthly precipitation data at the i-th month, and R i−1 denotes the average monthly precipitation data at the (i-1) th month.

Percentage of agricultural land use
Increased agricultural activity may have an impact on land subsidence, especially in areas with extensive irrigation practices.Figure 4 provides a thematic map illustrating the land use inventory.The study area is segmented into several distinct land use categories including agricultural land, aquacultural land use, livestock land use, manufacturing land use, and other regulated districts 31 , as depicted in Fig. 4.This depiction highlights that agriculture predominantly characterizes the land use in the study area, which includes STES, YCES, and NLPS.
In this study, we computed the percentage of agricultural land use within a 250 m radius of each MLCW.To perform this calculation, we utilized the buffer analysis tool in ArcGIS, which generates buffer polygons around input features at a specified distance for spatial analysis.This analysis allowed us to determine the proportion of agricultural land within the designated area.The percentage of agricultural land use is defined as follows.where P A is percentage of agricultural land use, A a is area of agricultural land use in the division unit, and A T is total area of the unit.

Percentage of fine-grained soil
Fine-grained soils may be susceptible to compaction when subjected to excessive groundwater extraction 6,11,13 .This compaction may result in land subsidence.The percentage of fine-grained soil were generated from the borehole data of the CGS and WRA of Taiwan.In accordance with the Unified Soil Classification System, www.nature.com/scientificreports/fine-grained soils are characterized by the fact that 50% or more of their particles pass through the No. 200 sieve 32 .Fine-grained soils encompass three distinct types: fine sand, silt, and clay.The percentage of fine-grained soil is determined by calculating the ratio of the combined thickness of fine sand, silt, and clay layers to the total drilling depth 32 .The percentage of fine-grained soil is evaluated using the following equation where P F is the percentage of fine-grained soil, H F is the soil thickness of fine-grained soil, and H T is the total soil thickness.

Length of the average maximum drainage path
To describe the deformation of fine-grained soils under consolidation, it is crucial to consider the length of the average maximum drainage path 6,11,13 .Considering the top and bottom drainage conditions for the soil layer, the length of average maximum drainage path is defined as the average drainage path length 32 , which can be expressed as follows.
where n denotes the number of fine-grained soil layer, H dr denotes the length of the average maximum drainage path during compaction and H if denotes the soil thickness of fine-grained soil.

Methodology Principal component analysis (PCA)
In the Choshui Delta, extensive and long-term environmental monitoring has been conducted over the years, encompassing groundwater level observations, rainfall measurements, and land subsidence monitoring, resulting in a substantial amount of available data 8,[10][11][12]14 . Theprimary objectives include gaining insights into groundwater hydrology, meteorological hydrology, as well as the compressional characteristics of subsurface geological formations and the land subsidence patterns of various soil layers at different depths.Due to data that can potentially serve as input factors, it becomes essential to identify the relevant and meaningful factors for neural networks.
To address this challenge, the utilization of PCA emerges as a statistical technique that effectively reduces data dimensionality while retaining the crucial information.
Our approach is designed to capture the intricate interactions among these variables within the context of land subsidence.
We propose a novel algorithm that leverages a multi-factorial perspective to accurately reconstruct the missing time-varying land subsidence data.By considering eight influential factors, our method seeks to capture the intricate interplay among these variables in the land subsidence process.Utilizing PCA, we ascertain the significance of these influencing factors and their principal components in relation to land subsidence.To reconstruct the absent time-dependent land subsidence data using PCA-derived principal components, we employ the backpropagation neural network.
Furthermore, the PCA results can influence the selection of input variables for the backpropagation neural network.By identifying the principal components that explain the most variance in the data, we can choose principal components as inputs for the neural network.This selection can enhance the network's training and predictive performance.
The PCA was carried out to obtain a set of principal components (PCs) that are linearly uncorrelated, defined as where λ is the eigenvalue, X represents the input data, and A represents a matrix.Using the linear transformation, we obtain the following equations: where E is the PC (eigenvector), and Y is the transformed variable.Equations ( 6) and ( 7) can be rewritten as where n is the features number.According to the above transformation, the dimensionality reduction is achieved and the dimensionality of original input data X was reduced from m to q.The original X was converted into the transformed variable Y by using the PC as the weights.Therefore, the following equations are achieved (6) where S is the covariance matrix defined as After computing the covariance matrix, the correlations are then identified.Equations ( 10) and ( 11) are rewritten as following equations once the reduction of dimensionality is unnecessary, Finally, the eigenvectors and eigenvalues of the covariance matrix are computed to identified the PC 13,33 .In this study, PCA serves as a preprocessing step in this study to assess the relationships between influencing factors and land subsidence, thereby enhancing data analysis and modeling.Its primary roles include the identification of influential factors, dataset simplification, and the potential enhancement of subsequent BPNN performance.Moreover, PCA is a linear dimensionality reduction technique that is primarily designed to capture linear relationships between variables.PCA works by finding linear combinations of the original variables that maximize the variance in the data.It is noted that PCA has limitations to capture non-linear relationships between subsidence and predictor variables.It is important to clarify that PCA itself does not directly resolve the issue of filling data gaps.Instead, it assists in understanding the underlying data structure and selecting the most relevant variables for modeling, which can indirectly improve the handling of missing data.PCA provides a comprehensive view of the data's internal structure, making it suitable for scenarios where variables may have intricate interactions.While correlation analysis is valuable, it may not capture all aspects of data complexity.

Artificial neural network
The spatiotemporal modeling of subsidence integrates the spatial characteristics and temporal nonlinearity of land subsidence.The overall framework comprises two main aspects: the construction of a spatiotemporal dataset and the modeling of land subsidence in the spatiotemporal domain 15,16 .The spatiotemporal dataset is constructed by the time series input features obtained by WRA leveling surveys and MLCWs.The spatiotemporal modeling involves three components: temporal evolution modeling, spatial correlation analysis, and spatiotemporal integration.Finally, the model is trained by adopting a substantial amount of time series data (February 2008 to February 2012 and April 2014 to June 2021) on land subsidence collected in Yunlin County.The structure of a basic BPNN is shown in Fig. 10.For time series prediction of land subsidence from groundwater withdrawals using artificial neural network (ANN) 20,28 , the training phase and the achieved outcomes are characterized as where y i is the hidden layer, Z k is the output layer, φ is the activation function, X j and Y k are the temporarily numerical results before utilizing the activation function, x i is the input layer, β oj and β ok are the bias weight, β ij and β kj are the weights of the connections.The activation function in this study was hyperbolic tangent sigmoid function.The hidden and output layers can be designated as The following error function (EF) is applied for error backpropagation weight training where ̟ k and t k are the error and target value for each node of the output.The objective is to minimize the above error function.The adjustment of weight between the hidden and output layers is ( 14) where µ presents the learning rate ranging from 0 to 1.The updated weight herein is then calculated by using the following equation: where υ presents the iteration number.The gradient of EF between the input and hidden layers is The updated weighting can be expressed as Two evaluation metrics were utilized to assess the performance of the proposed method.Firstly, the root mean square error (RMSE), a widely recognized metric in predictive modeling.RMSE quantifies the average discrepancy between the predicted values and the actual observed data.
In this study, eight influential factors, encompassing monthly groundwater level variation, monthly electricity consumption variation, variation of average monthly precipitation, percentage of agricultural land use, percentage of fine-grained soil, length of the average maximum drainage path, total monthly electricity consumption, and total monthly precipitation, were included in the PCA.As a result, we employ PCA to assess the relationship between these eight influential factors and land subsidence.Utilizing PCA, we ascertain the significance of these influencing factors and their principal components in relation to land subsidence.To reconstruct the absent www.nature.com/scientificreports/time-dependent land subsidence data using PCA-derived principal components, we employ the backpropagation neural network.

Results
The PCA is initially utilized to assess the relationship between the influencing factors and land subsidence.To reconstruct the missing time-varying land subsidence data based on the factors identified through PCA, we employ the BPNN.Detailed findings from this analysis are elaborated in the following sections.

Investigating the dominant factors and generating principal components
In this study, we adopt the PCA to examine the dominant factors effecting subsidence and generating principal components.The PCA results can be used to the input variables for the BPNN.By identifying the principal components that explain the most variance in the data, we can choose the dominant factors affecting land subsidence as inputs for the neural network.This selection can enhance the network's training and predictive performance.The dataset of three MLCWs at the STES, YCES, and NLPS from 2008 to 2021 were adopted.
As listed in Table 2, eight influential factors, denoted as factors 2 through 9, encompassing monthly groundwater level variation, monthly electricity consumption variation, variation of average monthly precipitation, percentage of agricultural land use, percentage of fine-grained soil, length of the average maximum drainage path, total monthly electricity consumption, and total monthly precipitation, were included in the PCA.Consequently, we applied PCA to evaluate the relationship between these eight influential factors and factor 1, representing monthly compaction change, which is indicative of land subsidence.
We first evaluate the relationship of the factors with land subsidence using the PCA.By calculating the correlation coefficient matrix, as listed in Table 2, factor 1 is the monthly compaction change, which is positively correlated with the factor 2 (monthly groundwater level variation) and factor 3 (monthly electricity consumption variation of managed wells).The correlation of the factor 2 is the highest, which is 0.75, indicating the variation of land subsidence is highly related to the fluctuation of groundwater level.Additionally, factor 3 (monthly electricity consumption variation) had a correlation coefficient of 0.61 with factor 1 (monthly compaction change), showing that land subsidence is significantly related to electricity consumption fluctuation.
Furthermore, results indicate that factor 8 (total monthly electricity consumption) and factor 9 (total monthly precipitation) had a moderate positive correlation, with correlation coefficients of 0.48 and 0.29, respectively, with factor 1 (monthly compaction change).Based on the PCA results, the primary factors influencing subsidence are identified as factor 2 (monthly groundwater level variation), factor 3 (monthly electricity consumption variation), factor 8 (total monthly electricity consumption) and factor 9 (total monthly precipitation).
Therefore, the above four factors have been selected for determining principal components in the PCA for the STES, YCES, and NLPS.
Table 3 lists the component loading values in the PCA for the STES, YCES, and NLPS, allowing us to assess the correlation between each factor and the PCs.From Table 3, it is found that factor 2 (monthly groundwater level variation) and factor 3 (monthly electricity consumption variation) for the STES, YCES, and NLPS exhibit correlations of 0.55 or higher with PC 1.Similarly, factor 8 (total monthly electricity consumption) and factor 9 (total monthly precipitation) for the STES, YCES, and NLPS also have correlations of 0.4 or higher with PC1.Additionally, it appears that for the STES, YCES, and NLPS, PC 2 is primarily influenced by factor 8 (total monthly electricity consumption).
Table 2.The covariance matrix for the three MLCWs at the STES, YCES, and NLPS.As listed in Table 4, it provides information on eigenvalues and their contributions to the PCs.The representativeness of each PC in explaining the entire dataset is determined by its contribution rate.Upon analyzing the eigenvalues and contribution rates of the factors, it becomes evident that the cumulative contribution of the first three PCs for the STES, YCES, and NLPS all exceed 93%.This observation implies that the first three PCs collectively account for over 90% of the data, indicating a significant level of representativeness.These three PCs are subsequently employed as input variables for the subsequent BPNN analysis.

Reconstructing the missing subsidence data using the BPNN
According to the WRA, the subsidence data from the MLCWs installed at STES, YCES, and NLPS are not available from March 2012 to March 2014.The missing time varying subsidence data are reconstructed in this study using the BPNN.The proposed methodology was applied to reconstruct the missing subsidence data at STES, YCES, and NLPS in Yunlin County.In the BPNN network, the discontinuity in the measured subsidence data is first recovered from the available data.The series with minor gaps of 24 months (from March 2012 to March 2014) is filled.These completed series are then carried out to predict other time series subsidence data.Predictive data are based on learning complete data from the three MLCWs installed at STES, YCES, and NLPS.Parameters for the BPNN model are listed in Table 5.
We first train the BPNN network using the monitored subsidence data spanning a 14-year period from 2008 to 2021 (specifically, February 2008 to February 2012 and April 2014 to June 2021).After the training phase of the BPNN, we test these subsidence data to recover the observations before extending the prediction to complete the sequence.In the BPNN network, the monitored subsidence data spanning from February 2008 to February 2012 and from April 2014 to June 2021 were randomly divided into training, testing, and validation datasets, with an allocation ratio of 70%, 15%, and 15%, respectively.All subsequent analyses related to hidden layers utilize a consistent count of 10.The Levenberg-Marquardt algorithm is used in the training phase of the BPNN.The PCs, as listed in Table 3, have been selected as input variables for the BPNN.RMSE value was calculated using the testing dataset to evaluate the impact of rainfall on the BPNN model's performance.
The predictive accuracy of the BPNN is summarized in Table 6.Three scenarios of input variables, including first: PC1, second: PC1 and PC2, and third: PC1, PC2 and PC3 are considered.Considering all three PCs as input variables for computing the RMSE of the testing dataset at STES, YCES and NLPS, it appears that the scenario with the consideration of all three PCs as input variables achieves great accuracy for three sites.
Figure 11 illustrates the reconstruction of missing compaction data using the BPNN.It reveals that employing three PCs as input variables for the BPNN can successfully reconstruct missing compaction data.Consequently, this study proceeded and generated a graphical representation of cumulative subsidence over the years.As depicted in Fig. 12, we compare the predicted subsidence data obtained using the BPNN model with the monitored subsidence data provided by the WRA 34 .Results reveal that good agreement can be obtained between the predictive results generated by the proposed BPNN model and the monitored subsidence data from the WRA 34 .

Discussion
The assessment of the relationship between influencing factors and land subsidence in this study begins with PCA.In the PCA, eight influential factors, encompassing monthly groundwater level variation, monthly electricity consumption variation, variation of average monthly precipitation, percentage of agricultural land use, percentage of fine-grained soil, length of the average maximum drainage path, total monthly electricity consumption, and total monthly precipitation, were included in the PCA.Based on the PCA results, primary factors influencing subsidence are identified as monthly groundwater level variation, monthly electricity consumption variation, total monthly electricity consumption and total monthly precipitation.Therefore, factors encompass variations in groundwater levels, fluctuations in electricity consumption of managed wells, total monthly electricity consumption and total monthly precipitation are selected for determining principal components.
The study's outcomes suggest that the BPNN approach presents itself as a practical and efficient alternative for predicting land subsidence.Its reliance on historical time-series data and the flexibility of not requiring highly detailed hydrogeological parameters make it accessible and applicable in a variety of real-world situations.Furthermore, the model's success in reconstructing missing data enhances its overall utility and robustness.
In summary, the results of the BPNN model demonstrate the effectiveness of the approach in accurately reconstructing subsidence data over extended time periods for these specific sites.This methodology has displayed promise in preserving key features of subsidence data, rendering it highly suitable for the selected areas.

Conclusions
In this article, we aim to address the challenge of reconstructing missing time-varying land subsidence data in the Choshui Delta, Taiwan.To accomplish this, we propose a novel algorithm that employs a multi-factorial perspective to effectively reconstruct the missing data.We consider eight factors including the groundwater level data, electricity consumption data, precipitation data, land use pattern, sediment type, and drainage path length, which are known to significantly influence land subsidence.Through our analysis, we summarize the key findings as follows: To assess the relationship between eight influencing factors and land subsidence, an initial step involves employing PCA.The PCA results reveal that the monthly compaction change exhibits positive correlations with the monthly variation in groundwater level, and the variation in electricity consumption of managed www.nature.com/scientificreports/wells.Notably, the correlation with groundwater level variation is found to be the strongest.This indicates that the variability of land subsidence is closely associated with fluctuations in groundwater levels.
In the BPNN network, the observed results demonstrate good accuracy between the predictions generated by the proposed BPNN model and the historical subsidence data.The results reveal that the reconstruction of missing data using the BPNN approach effectively preserves the key features of the subsidence data.Furthermore, the results demonstrate that the proposed neural network model does not require sophisticated soil compaction parameters and complex hydrogeological modeling techniques.This finding highlights the

Figure 1 .
Figure 1.Significant subsidence areas in Taiwan in 2022.This figure was created using ArcGIS 10.6.1 software.

Figure 2 .
Figure 2. Location of the Choshui Delta.This figure was created using ArcGIS 10.6.1 software.

Figure 3 .
Figure 3. Accumulated subsidence from 2011 to 2020 at 0 to 60 m.This figure was created using ArcGIS 10.6.1 software.

Figure 4 .
Figure 4. Location of multi-layer compaction monitoring wells.This figure was created using ArcGIS 10.6.1 software.

Figure 6 .
Figure 6.Plot of monthly compaction change versus year.

Figure 7 .
Figure 7. Plot of monthly groundwater level variation versus year.

Figure 8 .
Figure 8. Plot of total monthly electricity consumption versus year.

Figure 9 .
Figure 9. Plot of total monthly precipitation variation versus year.

Figure 11 .
Figure 11.Reconstruction of missing compaction data using the three PCs in the BPNN.

Figure 12 .
Figure 12.Comparison of results with observed data from the WRA 34 .

Table 1 .
Datasets in this study.

Table 3 .
The values of component loading in the PCA for the STES, YCES, and NLPS.

Table 4 .
Eigenvalue, rate of contribution and cumulative contribution of the principal components.

Table 5 .
Parameters used for the BPNN model.

Table 6 .
RMSE for the testing dataset using the PCs in the BPNN.