Introduction

In the loess area, deformation caused by collapsibility is a major distress mode in engineering construction. With the loess immersion under pressure, the structure is destroyed, and the pores gradually narrow, eventually leading to loess collapsible. Physical parameters have an important influence on the collapsibility of loess soils. The water indicators, density, pore, burial depth, geostatic stress, and physical characteristics can all influence collapsibility of loess soils1. It is difficult for designers to determine which factor to use to characterize collapsibility, and what kind of standards should be applied to accurate identification of loess stability.

Until recently, the collapsible mechanism was still the focus of loess research. Due to the special structure and characteristics of loess, the cause of collapsibility is inconclusive. In the early research, the researchers believed that the factors affecting collapsibility were single, and theories including Soluble salt hypothesis and Colloidal deficiency hypothesis were proposed. As research continues, researchers have discovered that there are many complex factors leading to collapsibility2,3,4. By studying the influence of soil characteristics on collapsibility, it was found that the possibility of collapse is related to the pore ratio, water content, density and other indicators of the undisturbed soil. These conclusions have also been confirmed in subsequent studies5,6,7,8,9. For example, as the initial water content and dry density increase, collapsibility tends to decrease7,8,9,10,11. It is widely accepted that the difference in collapsing is also reflected in the microstructure of the soil12,13. Numerous studies have shown that changes in the accumulation state of loess can also affect collapsibility14. Based on these results, the researchers tried to use some parameters of the soil to predict collapsibility. In regional studies, the predicted results are effective in general which in conventional geotechnical engineering practices15.

Collapsibility is still one of the most difficult engineering geological problems to predict. According to incomplete statistics, in Northwest China, there are more than 400 large-scale ground subsidences caused by the collapse of loess. In the Heifangtai area of Gansu, China, due to the deformation of the foundation, each family has renovated their houses twice on average16. Therefore, how to correctly evaluate the collapsibility level of loess has great engineering significance. Currently, there are two types of evaluation coefficient that identify the collapsibility of loess: δzs and δs. Both are usually obtained from the indoor immersion compression test. Compared with the field immersion deformation test, although the associated with far less expense, the huge amount of experiments and the complicated process are still the difficulties in predicting collapsibility at this stage2. In recent years, researchers have tried to use machine learning or artificial intelligent methods to predict collapsibility17,18. But before this, there are few studies on the large and multiple association rules between influencing factors and collapsibility.

Xining, Qinghai province of China is located on the edge of the Qinghai-Tibet Plateau. It is an important city in the “ The Belt and Road ” policy. In the past project construction in this area, a large amount of data has been accumulated. How to summarize these data to serve the future project construction has become a common concern for many engineers and researchers. In Xining, Qinghai province, rarely the method of big data analysis has been used on the studies of the engineering characteristics of loess. Therefore, this paper uses the Apriori algorithm to mine the relationship between physical parameters and collapsibility levels in 1039 loess samples, aiming to discover multiple association rules between them. In the analysis of these association rules, we can access much quantitative information, among which the evaluation criteria for collapsibility in this area are required. It can be used to simplify the workload of determining collapsibility. To achieve that, three steps are required: (1) identify the representative indicators that lead to collapsibility from 13 potential factors; (2) analyze the association rules of each factor with δs(coefficient of collapsibility) and δzs(coefficient of collapsibility under overburden pressure) separately; and (3) providing the evaluation criteria for collapsibility and constructive recommendations for projects construction based on the results obtained.

The following sections describe this procedure and methodologies in detail. The first step is to collate the data in the engineering survey report, thereby establishing a dataset that includes 13 potential factors and δs and δzs. Then, the original data has to be preprocessed, including reducing noise and normalization, discretization.The third step is to identify the key factors that lead to collapsibility, which use the method is Gray relational analysis. Subsequently, take key factors, δs and δzs as input item, and the Apriori algorithm is used to find multiple association rules. Compared with previous research, the results obtained from information mining based on big data are more reliable. Finally, based on the analysis results of the association rules, the evaluation criteria for collapsibility can be proposed, which can provide assistance to the engineering geological survey in the area, thereby simplifying the workload of indoor experiments.

Study site and data

Description of the study site

The topography of Xining, Qinghai province of China is located in the transitional zone between the Loess Plateau Plateau and the Qinghai-Tibet. During the Cenozoic, the area accumulated thick and continuous loess, nearly 25 m19. The study area is located in the Chengbei District, Xining, covering an area of about 137.7 km2 between longitudes 36°64′42″–36°69′65″N and latitudes 101°74′29″–101°76′45″E. The altitude increases from Northwest to Southeast and varies in the range from 2755 to 2215 m. The climate of the study area is characterized by the Alpine plateau climate; low pressure, low rainfall, large evaporation, long freezing period, large temperature difference between day and night. According to the China meteorological administration, the temperature in the region varies between − 26.6° C in winter and 38.7° C in summer, and the annual average is 5.7° C. The rainfall is about 7 mm in winter and 255 mm in summer, and the annual average is 371 mm. According to the engineering geological survey report, Qauternary strata is the major strata in the region, and all samples are Q4 loess. The soil characteristics are: silty soil with collapsibility, high compressibility and low strength; the pores are arranged in disorder, and there is calcium powder on the hole-wall20,21.

Data

In this study, the data originated from six construction projects, which were used to mine the association rules of collapsibility of loess. According to the Raida criterion, all data meet the statistical requirements. The specific location of those projects is shown in Fig. 1 and Table 1.

Figure 1
figure 1

Geographic location of the data collection. Created by Arcgis10.6 (https://www.arcgis.com/index.html) and Baidu map15.0. (https://map.baidu.com).

Table 1 Detailed data on study sites.

There are many physical parameters that lead to collapsibility. These potential factors can be divided into six categories: water indicators, density, pore, burial depth, geostatic stress, and physical characteristics. In this study, the dataset included 13 potential influence factors for 1039 samples (The details are shown in Fig. 2), which originated from six construction projects of the Chengbei district. Subsequently, those original data need to be preprocessed, including reducing noise and normalization, discretization.

Figure 2
figure 2

Physical parameters of soil samples.

Collapsibility of loess soils

Loess has been used as a foundation in various construction projects for a long time. The collapsibility of loess has always been a typical engineering geological problem in the loess region. The collapsibility of loess often causes huge damage to the engineering construction activities in its distribution area and is extremely destructive to engineering buildings. δs and δzs are important indicators for evaluating the collapsibility of loess. Both of them play important roles in engineering construction in the loess area. Details of δs and δzs in the study area can be seen in Fig. 3.

Figure 3
figure 3

Collapsibility level (δs and δzs) distribution under different burial depths.

Coefficient of collapsibility

An index for measuring the degree of collapsibility of a soil mass after immersion in water under a given pressure. According to the test of indoor confined compression. The definition of the coefficient of collapsibility(δs) can be represented as:

$$\updelta s = \frac{{h_{p} - h_{p}^{\prime } }}{h0} $$
(1)

where h0 is the soil thickness to maintain natural humidity and structure. hp is the soil thickness after subsidence and stabilization when the soil sample is pressurized to p (mm); hp is the thickness (mm) of the soil sample after being stabilized under pressure and sinking and stable under the action of water immersion. The pressure p is determined from the bottom of the foundation (preliminary survey from 1.5 m below the ground) 200 kPa within 10 m, and the saturated self-weight pressure of the overlying soil under 10 m to the top of the non-collapsible soil layer (300 kPa is still used when it is greater than 300 kPa).

Coefficient of collapsibility under overburden pressure

The ratio of the subsidence of the loess sample to the original height of the sample under the action of saturated self-gravity of the soil. It is an important index for judging self-weight collapse. Coefficient of collapsibility under overburden pressure (δs) can be presented as follows:

$$\updelta sz = \frac{{hz - h_{z}^{\prime } }}{{h_{0} }} $$
(2)

where hz is the thickness (cm) when the soil sample is pressurized to the saturated dead weight pressure corresponding to the overlying soil and subsidence is stable. hz is the thickness (cm) of the soil sample after pressure stabilization, under the action of immersion in water, and after sinking and stabilization.

Research methodology

Grey Relational analysis

Grey relational analysis is a method that uses Grey Relation Order (GRO) to describe the strength of the association, which was proposed by Tan and Deng. This method is widely used in industry, economics, management and other disciplines, and has achieved remarkable results. In this paper, the GRA algorithm is used to find out the significant factors in each category, and then these factors are the input items of the Apriori algorithm.

Apriori algorithm

Association rule analysis is necessary for data mining. By using association analysis to find frequent itemsets in the data, the structural characteristics of the data are revealed. Apriori algorithm is a classic algorithm for finding frequent itemsets and generating association rules based on this. Its essence is an iterative method of layer-by-layer search, and each search is divided into two stages: generating candidate sets and checking support. In the application of the Apriori algorithm, researchers can adjust the thresholds of the screening indicators, including Support and Confidence, thereby ensuring the practicability of the results. Hence, in this paper, the Apriori algorithm was used to investigate the correlation between influencing factors and collapsible levels. The implementation steps of the Apriori algorithm are shown in Fig. 4.

Figure 4
figure 4

Flowchart of apriori computer procedure.

Results and discussion

Determine the model input item

The 13 potential factors can be divided into six categories according to characteristics, including the water indicators, density, pore, burial depth, geostatic stress, and physical characteristics. Due to its large amount of data, it is necessary to use correlation analysis to identify the most important factors in each category.

The gray correlation level of each influencing factor and δs or δzs is shown in Fig. 5. As for pore, the porosity and void ratio had positive correlations with collapsibility, which may be interpreted as the pores increasingly in the loess soils allowing collapsibility to seriously. Physical characteristics had negative correlations with the collapsibility. As can be seen from the results of the Grey Relational Grades, IP is the most important significant influence factor within physical characteristics group. As for density, the natural density had the maximum GRG. In water indicators, the saturation has a higher correlation than water content for coefficient of collapsibility(δs), while the reverse was true for coefficient of collapsibility under overburden pressure(δzs).

Figure 5
figure 5

Grey relational grades between influence factors and δs or δzs.

Based on the results of Grey Relational Grades, select the most important factors in each category as input item of the Apriori algorithm: (1) for δs: saturation, natural density, void ratio, plasticity index, geostatic stress, burial depth. (2) for δzs: water content, natural density, porosity (for statistical purposes, convert this to void ratio), plasticity index, geostatic stress, burial depth. Then, the Apriori algorithm is used to mine association rules.

Result analysis of Apriori algorithm

In order to be used as an input item of the Apriori algorithm, the data should be preprocessed. The first step is normalization, uses the Min–max normalization method. The second step, discretization, uses a clustering algorithm to separate each factor into four categories according to different ranges. The preprocessing results are shown in Table 2. It can be seen that each factor is divided into four quantization types. For example, D1 means the Level1 (≤ 1.54 g/cm3) of the natural density.

Table 2 Classification standards for collapsibility and key influence factors.

There are two kinds of association rules used for analysis in this paper, which is obtained by Apriori algorithm: (1) when circumstances are high confidence level, find the rules which have the highest support level. (2) the rule where the confidence is 100%.

For mining category (1), the results are shown in Table 3. It is worth pointing out that the thresholds of confidence and support here are 4% and 70%, respectively.

Table 3 Association rules between influence factors and collapsibility (high Support with relatively high confidence.

As mentioned above, the Support represents the probability that A and B occur simultaneously, and the meaning for the Confidence is the probability that B will occur if A occurs. By analyzing them, the specific influence of factor changes on the collapsibility can be obtained, so as to explore the fundamental principles of loess collapsibility. (1): For δs, Support (E1C1) = 8.71%, which signifies that there are 8.71% of instances where void ratio ≤ 0.8106 and non-collapsibility appear at the same time. When Confidence (E1 C1) is 85.38%, this indicates that if the void ratio is under 0.8106, the probability of non-collapsibility equals 85.38%. The Confidence (D4 ∩ S4 C1) = 92.73% indicates that saturated soil with high natural density can reduce the probability of collapsible level. The confidence of (D2 ∩ G3 C2) = 85.98% reveals that if the medium-density loess soil samples with geostatic stress at 164.3–252.3 kPa, the probability of slight collapsibility is only 4.43%. Comparing the Confidence (D4 C1) = 88.64% and Confidence (D3 C1) = 73.39%, it can be found that heavier natural density can decrease the risk of collapsibility from 15.34 to 0%. (2): For δzs, the Confidence (D4 CO1) = 90.15% and Confidence (D3 CO1) = 72.94%, it can be found that compared to δs, the change in density has a greater impact on δzs. The confidence of (H1 ∩ P3 CO2) = 70.23% reveals that if the burial depth of high plasticity loess soils is below 3 m, the probability of slight collapsibility is only 4.42%. Table 4 lists the association rules for mining category (2). The support range for each rule in the list is 2.60–0.86%, and the confidence is 100%. It can be seen that for the association rule to be 100% confidence, at least two constraints are required. This means that the accuracy of predicting collapsibility by a single factor is not enough. Among these, (1): For δs, the Support (E1 ∩ G4 C1) = 2.60% and Confidence (E1 ∩ G4 C1) = 100% indicates that there are 2.60% of cases with the geostatic stress greater than 252.3 kPa with void ratio ≤ 0.8106, all of which exhibit non-collapsibility. The Confidence (D3 ∩ E1 ∩ G4 C1) = 100% also indicates that if the natural density = D3, void ratio = E1 and geostatic stress = G2, there are non-collapsibility. The Support (D2 ∩ E3 ∩ G3 ∩ S3 C2) = 100% indicates that if the medium density and high void ratio soil has a saturation of 33.5–48%, when the geostatic stress is at 164.3–252.3 kPa, it will easily occur slight collapsibility. (2): For δzs, the Confidence (H4 ∩ W4 CO1) = 100% indicates that if the water content is greater than 20.4% and the burial depth exceeds 9 m, it is most likely to be non-collapsible. According to Table 4, these association rules with 100% confidence can contribute to us determining collapsibility. For δs (coefficient of collapsibility), it can be determined as non-collapsibility when any of the following conditions occur: Void ratio = E1(≤ 0.816) and Geostatic stress = G4(> 252.3kpa), Burial depth = H4(> 9 m) and Saturation = S4(> 67.8%), Plasticity index = P1(≤ 7.3) and Saturation = S4(> 67.8%), Natural density = D4(> 1.8 g/cm3) and Plasticity index = P1(< 7.3). In addition, when the Saturation is 33.5–48% and the Geostatic stress is 164.3–252.3kpa, the loess sample is 83% likely to be slight-collapsibility. For δzs (coefficient of collapsibility under overburden pressure), it can be determined as non-collapsibility when any of the following conditions occur: Void ratio = E01(≤ 0.789) and Plasticity index = P1(≤ 7.3), Burial depth = H4(> 9 m) and Water content = W4(> 20.4%). Conversely, there is a 70% possibility that the loess is slight-collapsibility when the burial depth is less than 3 m and the plasticity index is 8.2–10.2. If the natural density of the loess is greater than 1.8 g/cm3 and the plasticity index is less than 7.3, then both δs and δzs are non-collapsibility. However, when the natural density is 1.54–1.66 g/cm3 and geostatic stress is 164.3–252.3kpa, the possibility of slight-collapsibility of δs and δzs is 86%.

Table 4 Association rules between influence factors and collapsibility (high confidence with relatively high support).

For the convenience of single factor analysis, Fig. 6 summarizes the confidence values that each factor is in the first to fourth levels when the loess is non-collapsibility (C1 and CO1).

Figure 6
figure 6

(a) Association rules between individual factors and δs within each category. (b) Association rules between individual factors and δzs within each category.

It can be summarized by analyzing the second column of Fig. 6b that for δzs, with the factor D degraded from level 4(most unfavorable conditions) to level 1(most favorable conditions), the confidence of condition CO1 increases from 12 to 90%. This means that as the natural density increases, the risk of collapsibility correspondingly decreases. A similar trend is also manifested in factor E. If the void ratio of the loess soils is lower than 0.789, the probability of non-collapsibility is calculated to be 88%. In contrast, if the void ratio is more than 1.012, the probability of non-collapsibility declines to 15%. Compared with the condition of H2 and H4, there have higher confidence values when the factor H is in the H1 and H3, which can be attributed that the soils buried at depths 6–9 m or less than 3 m have more serious collapsibility. There is no discernible differentiation between confidence (P1 CO1), confidence (P1 CO1) and confidence (P3 CO1), but the confidence (P4 CO1) is much higher. A reasonable explanation for this result is that when the plasticity index is ≤ 10.2, the possibility of collapsibility will significantly increase. For δs (Fig. 6a), it can be seen that there is the same development tendency of confidence for factors D, G, P and H.

When the collapsibility is in the serious condition (C3, C4 or CO3, CO4), the confidence of each factor under the most favorable conditions and the most unfavorable conditions is shown in Fig. 7. As can be seen from the Fig. 7a, E1/D1/G1/H1/S1/ P1 C3  C4 is significantly smaller than E4/D4/G4/H4/S4/ P4 C3  C4, which means that when the physical parameters of the loess reach a certain threshold, the risk caused by δs will increase to a large proportion. On the contrary, it can be concluded that when the physical parameters are below a certain threshold, it is little or no serious risk of collapsing. As Fig. 7a,b shows, we can conclude that, among all the factors studied, natural density is the key factor leading to serious collapsibility. If the natural density increases from under 1.059 g/cm3 to above 1.8 g/cm3, the probability of C3 or C4 will decrease from 48 to 0%. Instead, burial depth has little effect on collapsibility, with a probability level from 32 to 27%. It can be seen from Fig. 7b, for δzs, there are similar results. But it is worth noting that the probability of CO3 or CO4 in the worst case is less than δs as a whole. In addition, the influence of burial depth on δzs is opposite to that of δs.

Figure 7
figure 7

(a) Significant extracted association rules when δs = C3 or C4. (b) Significant extracted association rules when δzs = CO3 or CO4.

Conclusion

In this paper, the data sets used for the study included 13 influencing factors and 1039 samples from six construction projects in Chengbei District, Xining City, Qinghai Province, China. Then, Apriori algorithm is used to find multiple association rules of collapsibility of loess. The following conclusions can be drawn:

  • The potential factors can be divided into six categories according to characteristics, including water indicators, density, pore, burial depth, geostatic stress, and physical characteristics to analyze the influence of these factors on the collapsibility of loess. The original data contains 13 potential influencing factors from six engineering construction projects in Chengbei District, Xining City, Qinghai Province, where the collapsibility of the loess has a great negative impact on engineering design and construction.

  • Analyze the key influence factors on δs(coefficient of collapsibility) and δzs(coefficient of collapsibility under overburden pressure), and explore the association rules of the collapsible level in this area. These strong association rules can provide assistance for future research on collapsibility.

  • According to Grey Relational Analysis, the key influencing factors in each category are identified. Results indicated that the saturation, natural density, void ratio, plasticity index, geostatic stress, burial depth were the key influence factors to δs. For δzs, the key influence factors are the water content, natural density, porosity, plasticity index, geostatic stress, burial depth. Subsequently, take key factors, δs and δzs as input item, and the Apriori algorithm is used to find multiple association rules. At the same time, the determination of key factors also provides suggestions for the study of predicting δs and δzs.

  • In the construction and design of engineering projects in this area, it should be noted that the loess with a burial depth of 6–9 m and less than 3 m in the study area has higher collapsibility. In addition, it is worth mentioning that natural density is the most critical factor leading to collapsibility among physical parameters.

  • By using the Apriori algorithm, some strong correlation rules about collapsibility of loess were found. According to those association rules, the evaluation criteria for collapsibility in this area is proposed, which can be used to simplify the workload of determining collapsibility. For example, the engineers can determine that the loess sample is non-collapsible when the geostatic stress is greater than 253.4 kPa and the void ratio is less than 0.816. If the natural density of the soil sample is 1.54–1.66 g/cm3 and geostatic stress is 164.3–252.3kpa, then there is an 86% probability of being slight-collapsibility.