Introduction

High-speed railways (HSR) provide efficient and convenient services for passenger transportation, playing an important role in stimulating the regional economic and social development1,2. As of 2020, the length of HSR tracks in China has reached 38,000 km, spanning various terrains and climatic zones3,4. Consequently, constructing railways becomes inevitable in environmentally challenging areas. The subgrade of HSR constitutes a soil structure directly exposed to the natural environment, and the impact of climate change on the long-term performance of HSR subgrades cannot be ignored. According to the “The Global Climate In 2015–2019” released by the World Meteorological Organization and the “2019 China Climate Bulletin” released by the China Climate Center, with global warming, extreme rainfall events are increasingly frequent in China. Under the action of high-frequency train loads, the subgrade is more prone to various defects such as settlement5, uplift deformation6,7, frost damage8, and mud pumping9. The occurrence of these defects is uncertain and poses a significant threat to human life and property. Therefore, investigating their spatial pattern, defect mechanisms, and vulnerability indicators is of great significance for the early warning of defects and the planning of new HSR lines.

Traditional risk assessment methods for HSR subgrades primarily utilized qualitative methods included the Analytic Hierarchy Process (AHP) and expert scoring method, while quantitative models such as the Information Value Model and Frequency Ratio Model relied heavily on extensive data for support. Machine learning has gained increasing attention because of its advantages such as robust generalization ability, high processing efficiency, and the ability to handle large datasets. For example, Wang et al. (2022) utilized two deep learning (DL) algorithms, convolutional neural network (CNN) and deep neural network (DNN), to map landslide susceptibility on a branch line of the Sichuan-Tibet Railway10. Liu et al. (2018) quantitatively analyzed the susceptibility of existing and planned railroad systems in China to rainfall-triggered multi-hazards using Random Forest (RF) and historical disaster events from 1980 to 199811. Huang et al. (2022) integrated four machine learning models, i.e., Bayesian Networks (BN), Decision Tables (DTable), Radial Basis Function Networks (RBFN), and Stochastic Gradient Descent (SGD), to delineate landslide-prone zones in order to reduce the risk of the construction, maintenance, and transportation of the railroad in Sichuan12. Huang et al. (2023) organized seismic damage data of bridges, then used RF to predict seismic damage levels, and used a two-parameter normal distribution function to draw empirical susceptibility curves for seismic damage risk assessment13. Sresakoolchai et al. (2023) developed a novel intelligent automated system based on machine learning pattern recognition for detecting and predicting the deterioration of railroad turnouts exposed to flood conditions14. Although machine learning gained frequent application in railroad safety risk assessment, it tends to focus on the disturbance of railroad operational status by external small-scale disasters, and overlooked the impact of structural changes in the subgrade itself on overall railroad risk in the climatic and geographical environment.

Selecting driving factors of defects is a crucial step in predicting the risk of railway subgrade failure in China under long-term environmental changes. Previous studies indicate that subgrade defects are influenced by rainfall, temperature, geological conditions, and land use patterns. For example, rainfall induces the absorption of water by soft, weak mudstone, resulting in arching on the subgrade. Water infiltration from the surface into the subgrade soil reduces its shear strength, leading to various defects in the subgrade. Thus, the occurrence of defects is the outcome of multiple factors in specific conditions. Zhang et al. (2016) conducted freeze–thaw tests on soil–cement mixtures from a construction site to explore the freeze–thaw susceptibility of closed and open systems15. The findings revealed that the frost heave rate was influenced by the initial water content before freezing and the replenishment of moisture during the freezing process. Wan et al. (2022) established a vehicle-track-subgrade vertical dynamic coupled analysis model was established using ABAQUS16. The study found that debonding easily occurred between the end of the base plate and the surface of the subgrade. As rainwater continuously infiltrated and saturated the surface of the subgrade, fine particles gradually migrated upward under the action of train loads and accumulated on the surface of the subgrade, leading to mud pumping. Although previous studies have explained the processes underpinning defect occurrence through field experiments and numerical simulations, the extent to which driving factors affect defects remains unknown. Further investigation is needed to clarify the correlation between different environments and different defects.

This study utilized a novel dataset of historical subgrade defect occurrences to reveal their relationship with environmental factors for large-scale infrastructure risk assessment for HSR in China, employing machine learning methods. The main objectives of this study are to (i) investigate the diverse impacts of environmental factors on multiple common subgrade defects, and (ii) spatially predict the co-occurrence risk of subgrade defects. Spatial risk maps of common road defects were constructed, offering a decision-making basis for the safe management and spatial planning of HSR in China.

Methods and data

Historical HSR subgrade defect occurrences in China

We recently compiled an extensive georeferenced dataset of historical HSR subgrade17. The dataset was sourced from 24,735 peer-reviewed literature published from 1999 to 2022 in both Chinese and English, and a quality control procedure was applied to remove duplicates and ensure accuracy18,19. Subsequently, a total of 661 georeferenced event records of eight defect types were selected, crossing provincial, municipal, county, township, and smaller scales. Notably, subgrade settlement (settlement values ranging from 5 to 2300 mm), frost damage (frost heave values ranging from 4 to 50 mm), uplift deformation (ranging from 5 to 122 mm), and mud pumping exhibit the longest reporting history among the identified disease types. These definitions are detailed in Table 1. The distribution of HSR subgrade defect records across Chinese prefectural-level administrative regions is illustrated in Fig. 1.

Table 1 Definition of major types of HSR subgrade defect.
Figure 1
figure 1

Distribution of HSR subgrade defect records in China.

The results indicate that the occurrence of these defects can be closely related to local climate and geological environment. For example, frost damage events are concentrated in the temperate zone of China, which is characterized by long and cold winters and high humidity throughout the year. The presence of pore water in the soil particles in the subgrade freezes and forms ice layers, resulting in soil displacement and subgrade frost heave. Mud pumping events are concentrated in the southeastern part of China, where frequent heavy rainfall occurs, causing a large amount of rainwater to infiltrate into the subbase and reduce its bearing stiffness. Under the high-frequency dynamic loads of trains, mud pumping and, in severe cases, subgrade settlement can occur. Subgrade swelling and upheaval are closely related to the slight expansion of the fill material used. Within the same climatic zone, multiple diseases often coexist, making the subgrade condition more complex.

Environmental driving factors

Climate variables

Average annual rainfall: Rainfall may alter the engineering properties of subgrade materials, thereby influencing the stability of the subgrade20.

Consecutive 5-day rainfall: This data serves as an index reflecting extreme rainfall21.

Number of days with maximum temperature exceeding 35 degrees celsius: This data can serve as an indicator reflecting extreme high temperatures16.

Annual freezing days: Annual freezing days quantify the number of days in a region where water freezes, and it is a key factor influencing the occurrence of frost damage on roadbeds15.

Wind speed: Strong winds may erode road shoulders, leading to a reduction in subgrade width, with sleepers/track panels exposed, thereby affecting the stability of the railway track22.

Geomorphological variables

Elevation: Elevation defines the highest and lowest points within a region and is reported to relate to the occurrence of various defects, such as, a number of defects have been reported on the Menyuan-Minle section of the Lanzhou-Urumqi HSR at high altitude15.

Slope and aspect: HSR subgrades may have varying slopes, resulting in different temperatures inside and outside the subgrade, potentially leading to uneven settlement23,24. The slope gradient may have an impact on the flow of moisture, thereby disrupting the drainage of the subgrade25.

Geohydrological variables

Rock hardness: Harder rocks can provide better support for HSR subgrade3.

Distance to fault: Geological faults provide pathways for groundwater and surface precipitation, which can affect subgrade26.

Soil texture: Subgrade defects can be associated with the types and properties of surrounding soil27.

Average distance to river: The presence of rivers increases the amount of groundwater in the surrounding geological environment, thus affecting the performance of subgrade28.

Average distance to lake: Lakes increase the amount of groundwater in the surrounding geological environment, which can impact the performance of subgrade8.

Anthropogenic variables

Land use: Land use indirectly influences the occurrence of subgrade defects. Extracting groundwater in urban areas can lead to subgrade defects, while areas with multiple rock types can enhance the strength of subgrade and reduce settlement29.

Average distance to road: Road construction, as a human activity, can have an impact on railway lines30,31.

Variable sources and preparation

The average annual rainfall, consecutive 5-day rainfall, number of days with maximum temperature exceeding 35 degrees Celsius, annual freezing days, and wind speed data were sourced from the National Earth System Science Data Center (http://www.geodata.cn/), with a spatial resolution of 0.25° and a time range from 2007 to 2016. We obtained annual average rainfall data through kriging spatial interpolation. The remaining factors were summarized within specified regions using ArcGIS's zoning statistical function, displaying the data values in tabular form; Elevation data were obtained from the Geospatial Data Cloud (http://www.gscloud.cn) with a resolution of 30 m. Slope and aspect data at a 30 m resolution were derived using ArcGIS software's slope function and aspect analysis tool; Land use data were sourced from the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (http://www.igsnrr.ac.cn), with an accuracy of 30 m. We calculated land use area within specified regions using ArcGIS’s zoning statistical function; The road, river, and lake data were extracted from OpenStreetMap. We calculated the average shortest distance from railway lines to these features using ArcGIS; Rock hardness and fault data were provided by the Geological Survey Cloud of the China Geological Survey Bureau (https://geocloud.cgs.gov.cn/). We categorized geological formations into different intervals to determine the average rock hardness within the region. The average shortest distance from railway lines to geological faults was calculated using ArcGIS. Soil texture data were sourced from the Harmonized World Soil Database (version1.2) (https://www.fao.org/home/en/), and we selected four soil attributes, including soil drainage capacity, soil composition, soil effective water storage capacity, and soil depth through filtering processes.

Methods

Data processing

The variables were standardized using the StandardScaler module, and the hyperparameters of the RF were optimized using grid search to build a screening model32. To streamline and enhance model performance, the recursive feature elimination method was used to remove the environmental variables with minimal contribution33,34. Specifically, a RF model was iteratively established 18 times, eliminating the least important environmental factors in each screening process based on their contribution. A criterion was set to prevent the incorrect elimination of important factors, ensuring that the contribution of the eliminated factors did not exceed 0.005. The adjusted remaining predictor factors were reintroduced into the model. Finally, 55 factors, out of the initial 73, for each type of defect were retained to construct the risk prediction model. All the factors are shown in Table 2.

Table 2 Detailed description of factors.

Random forest modelling

The RF model is one of the most commonly used integrated algorithms in applied Machine Learning studies35,36. It utilizes repeated independent sampling to extract multiple samples from the original dataset and constructs decision trees for each sample. These decision trees are then aggregated and combined by voting, taking each decision tree as a member to achieve classification and prediction. In this study, the Random Forest algorithm emerges as a crucial tool in predicting the risk of subgrade defects in HSR infrastructure. Its capacity to process extensive datasets with various input variables and is robust against overfitting make it exceptionally suited for this task. Furthermore, as a non-parametric model, RF does not require assumptions about any specific form of relationship between variables, offering a significant advantage in examining the complex and not yet fully understood interplay between environmental factors and subgrade defects. Applying RF allows us to capture non-linear relationships and variable interactions that traditional statistical methods might overlook. Finally, RF is widely recognized and effective in identifying and determining variable importance. As a result, this approach has been successfully applied in the past for mapping landslides, debris flows, and many other types of disasters28,29,37.

RF calculates the decrease in Gini index \({D}_{Gk}\) by evaluating the evaluation factor k during node splitting. The importance of the evaluation factor k is determined by summing up \({D}_{Gk}\) of all nodes in the forest and taking the average over all trees. This measure represents the percentage of the average decrease in Gini index for the evaluation factor in relation to the total average decrease in Gini index for all factors. It is calculated according to Eq. (1):

$${P}_{K}=\frac{\sum_{h=1}^{n}\sum_{j=1}^{l}{D}_{Gkhj}}{\sum_{k=1}^{m}\sum_{h=1}^{n}\sum_{j=1}^{l}{D}_{Gkhj}}$$
(1)

where m, n, and l represent the total number of evaluation factors, the number of classification trees, and the number of nodes in a single tree, respectively. \({D}_{Gkhj}\) refers to the decrease in the Gini index of the jth node in the hth tree for the kth evaluation factor. \({P}_{K}\) denotes the importance level of the kth evaluation factor among all evaluation factors.

When constructing the RF models, the dataset was divided into a 7:3 ratio for training and validation. To enhance the robustness of model predictions and quantify the uncertainty, we employed an ensemble of 50 models trained on separate bootstraps of the dataset. The hyperparameters of each of the 50 individual models were determined using grid search, with random combinations of parameters, while all other tuning parameters were set to their default values. The combination with the highest average accuracy across the models was selected as the optimal parameter choice for the model. Furthermore, a five-fold cross-validation strategy was employed, whereby the training dataset was divided into 5 equal subsets, with 4 subsets used for model training and the remaining subset utilized for testing. This five-fold process was repeated iteratively, rotating the testing subset, in order to fully leverage all the training data for model training and testing while mitigating the impact of overfitting. To minimize the influence of randomness, each type of pathology was subjected to 50 models. Each one of these 50 models predicted the environmental risk on a continuous scale ranging from 0 to 1, and the final prediction graph was generated by calculating the average prediction across all models.

The model’s classification accuracy is analyzed using the Receiver Operating Characteristic (ROC) curve38,39,40, depicting the true positive rate on the vertical axis and the false positive rate on the horizontal axis. Greater accuracy in model classification is indicated by a higher true positive rate and a lower false positive rate. The ROC curve is generated by plotting the true positive rate (proportion of correctly identified defect samples) against the false positive rate (proportion of falsely identified non-defect samples).

Integrated risk map generation

The integrated HSR infrastructure risk assessment involves a holistic analysis that encompasses multiple subgrade defects that are most commonly reported in China, such as settlement, frost damage, uplift deformation, and mud pumping. This approach takes into account the cumulative impact of various factors—including climatic conditions, geomorphological features, geohydrological characteristics, and human activities—on the subgrade's safety. In regions where the integrated risk scores are relatively high, an enhanced need for coordination and management emerges to effectively mitigate potential risk.

To quantify this integrated risk, we utilized the Random Forest (RF) model to evaluate the probability of each defect type occurring, averaging the outcomes across 50 iterations. Natural breakpoints were then utilized to divide each defect into four risk levels: low, low-medium, medium–high, and high28,41,42. Portions with average probability values greater than 0.6 for each defect were selected and assigned a value of 1; otherwise, they were assigned 0. Spatial coupling of the four defects was performed to produce a comprehensive risk map of railway subgrade defects in China The low, medium, high, and very high risk areas in the graph have values of 0, 1, 2, and 3, respectively, representing the risk level of the area.). It is noteworthy that this map displays regions with high risks for all four defects (probability values greater than 0.6), thus necessitating extra attention in HSR operations and new HSR planning. All distribution maps in the figure were drawn by ArcGIS (v10.7, www.esri.com).

Results

Evaluation of model predictive power

This study employed RF to evaluate the susceptibility of road defects in China, verified the training accuracy (success rate) using its ROC curve. The average AUCs for subgrade settlement, frost damage, uplift deformation, and mud pumping were obtained through 50 rounds of sampling, with values of 0.76, 0.96, 0.80, and 0.81, respectively, as shown in the Fig. 2. The green line represents the average ROC curve, while the black lines represent the 50 individual ROC curves. These results demonstrated that the RF model exhibited good prediction capabilities for generating risk maps of subgrade defects.

Figure 2
figure 2

Model prediction evaluation using AUC values and ROC curve analysis: (a) subgrade settlement, (b) frost damage, (c) uplift deformation, and (d) mud pumping.

Predicted high risk areas for different subgrade defect types

Predicted high risk areas for settlement defects mainly concentrate on the Lanzhou-Urumqi HSR and the Shanghai-Nanjing Intercity Railway, with higher susceptibility in northwest and central China (Fig. 3a). Frost damage risks (Fig. 3b) were predicted to primarily concentrate on the Harbin-Dalian HSR, the Lanzhou-Urumqi HSR, with higher susceptibility in northeastern and western China. Areas prone to uplift deformation (Fig. 3c) were predicted to mainly concentrate on the Lanzhou-Urumqi HSR, and to mud pumping defects (Fig. 3d) primarily concentrate on the Shanghai-Nanjing Intercity Railway and the Wuhan-Guangzhou HSR, with higher susceptibility in southeast China.

Figure 3
figure 3

Predicted risk distribution of main HSR subgrade defects in China: (a) subgrade settlement, (b) frost damage, (c) uplift deformation, and (d) mud pumping. Mean are shown for each ensemble of 50 RF models.

In the integrated risk map for subgrade defects in China’s HSR (Fig. 4), the predominant occurrences of subgrade defects in China’s HSR are concentrated in the northeast, northwest, and central regions. The Lanzhou-Urumqi HSR has the highest likelihood of subgrade defects, which is closely correlated with the local climate and environmental conditions.

Figure 4
figure 4

Integrated co-occurrence risk map of HSR subgrade defects in China.

Key environmental drivers of subgrade defect risk

The occurrence of each defect is influenced by multiple influencing factors, each with varying degree of impact. Utilizing the “Gini coefficient” based on the RF model43, the average factor importance of 50 sets was calculated to generate the final factor importance ranking for each defect, as shown in the Fig. 5. We selected the top 10 most important factors for presentation. Regarding settlement defect, the importance factors included elevation, slope, and land use-bare rock, with importance values of 0.063, 0.044, and 0.041, respectively. For frost damage, the importance factors were number of freezing days per year, annual average rainfall, and continuous 5-day cumulative rainfall with importance values of 0.20, 0.082, and 0.076, respectively. For uplift deformation, the elevation, continuous 5-day cumulative rainfall, and land use-bare rock had importance values of 0.081, 0.048, and 0.047, respectively. For mud pumping, the driving factors were number of days with maximum temperature exceeding 35 degrees Celsius, annual average rainfall, and number of freezing days per year, with importance values of 0.077, 0.070, and 0.067, respectively.

Figure 5
figure 5

Importance map of defect factors of HSR subgrade foundations in China (a-d, subgrade settlement, Frost damage, uplift deformation, and mud pumping).

Discussion

Climatic impacts

Our results indicates that meteorologically variables have a significant impact on subgrade defects, particularly in frost damage and mud pumping. The analysis prioritizes the identification of the most influential meteorologically variables associated with each defect.

Rainfall factors ranked among the top three in terms of importance, with the exception of settlement defects. The driving force behind the influence of rainfall on common subgrade defects lies in its capacity to increase the moisture content of the pavement soil. We have found that defects such as subgrade settlement, frost damage, mud pumping, and uplift deformation are intricately linked to the presence of water, which is consistent with the research results of many researchers20,27,44,45,46. In the case of frost damage, soil moisture undergoes crystallization into ice, filling soil voids during temperature drops, resulting in relative displacement of the subgrade particles. Mud pumping could be influenced by the softening of pore water pressure in the subgrade under train loads, making it highly susceptible to pumping and subgrade softening. Uplift deformation is associated with the expansion of expansive rock and soil in the subgrade expands upon water absorption. Related research has shown a significant correlation between the vertical uplift deformation rate of the pavement and the amount of atmospheric rainfall. Furthermore, extreme precipitation, indicated by the rainfall amount over 5 consecutive days, could exceed subgrade drainage capacity, elevating soil moisture and heightening susceptibility to defects.

We found that the annual freezing days have the greatest impact on the frost damage of the subgrade, which is consistent with the indoor experiments of subgrade permafrost and numerical simulations25, because the annual freezing days are closely related to the freeze–thaw cycle of the subgrade, which can lead to the occurrence of frost damage. The variable classified frozen soil in subgrade into three categories: instantaneous frozen soil, seasonal frozen soil, and permafrost. Permafrost, due to its long-term exposure to cold areas, is often in a state of freezing expansion. Seasonal frozen soil experiences thawing and settlement in summer and freezing expansion in winter. The recurring cycle of freezing expansion and settlement poses a significant risk of HSR subgrade defects. The damage to the subgrade in regions with repeated occurrences of such frozen soil is notably higher than in areas with permafrost. Instantaneous frozen soil generally experiences less freezing expansion. The effect on mud pumping mainly stems from the fact that after the freezing and thawing of the subgrade soil. The water content in the soil takes various forms, including ice crystals and residual moisture, which leads to a decrease in the drainage capacity of the subgrade and makes it difficult to drain moisture effectively. Consequently, poor drainage can result in mud pumping in the subgrade.

Extreme heat, represented by the total number of days with a maximum daily temperature exceeding 35 degrees Celsius, could impact the HSR infrastructure. High temperature can cause the rubber material at the interjoint of the ballastless track slab to harden and fatigue, resulting in the detachment of the interjoint interface and unevenness in the track. With repeated cycles of high-temperature and low-temperature alternation, the rubber material may even fracture, contributing to mud pumping defects.

Geomorphological and geohydrological characteristics

Our results show that geomorphology had a significant impact on roadbed settlement and uplift deformation defects (Fig. 5), while the geohydrological factors showed comparatively less impact. This is inconsistent with the research findings of some researchers11,47, possibly because HSR has already avoided the risks caused by geohydrological variables during the design phase. On steeper slopes, especially during heavy rainfall, soil erosion is prone to occur on the subgrade surface. Soil erosion may accelerate the settlement process of the roadbed, affecting its stability. In areas with significant slopes, the speed of water flow may be higher, which could impact water infiltration and drainage. This may result in uneven distribution of moisture in the soil, thereby affecting the settlement behavior of the subgrade.

Moreover, with increasing altitude, temperature, precipitation, and atmospheric pressure may undergo significant changes, thereby affecting the stability of the subgrade. From a topographic perspective, high-altitude areas, characterized by steep slopes and complex terrains, yield significant variations in local climates between foothills and hinterlands. Therefore, a more detailed analysis of the region and a precise subgrade risk assessment based on the local climate are necessary.

The microscopic properties of clay minerals contribute to their capability to absorb water molecules on their surfaces. Frost damage to the subgrade only occurs when the water in the soil reaches or exceeds a certain threshold, making clay more likely to cause frost damage than other soils. As for the uplift deformation of the subgrade, when clay absorbs water, its volume will increase and expand, which may lead to up-arching of the roadbed.

Anthropogenic influence

Our results show that anthropogenic variables have a significant impact on various subgrade defects, with the analyses emphasizing the most influential anthropogenic variables associated with most defects. Urban land use signifies the extent of anthropogenic interference, which includes the extraction of groundwater, the construction of underground facilities, mining and so on. When groundwater is extracted, the water pressure in the soil changes and the pumped water carries away fine particles from the soil, resulting in settlement of the subgrade. For example, in the Jakarta and Bandung areas along the Jakarta-Bandung High-Speed Rail, industrial activities and rapid population growth have resulted in the extensive extraction of groundwater, causing significant land subsidence that severely affects the operation of the high-speed train48. For the uplift deformation defect, in the high-density urban area, a large number of buildings. The area of bare rock reflects the surface area of land consisting of rocks. The bare rock areas may have good characteristics for water infiltration and drainage, which can slow down soil settlement through drainage. In addition, the scarcity of soil moisture may prevent the swelling of weak mudstone, thus reducing the occurrence of roadbed expansion defects and positively affecting the stability of the subgrade.

Limitations and future improvements

Our study has several limitations that can be addressed in future research. Firstly, the defect data in this study are sourced from peer-reviewed literature, which ensures accuracy but may overlook some unreported defect data. Secondly, selecting the model’s hyperparameters poses significant challenges. The crucial parameters of the model are determined through trial and error using a network search method. If the search space is set inappropriately or potential solutions are overlooked, the optimal solution may not be found. Furthermore, while the Random Forest method is recognized for its strong predictive capability, it falls short in interpretability. Future research should consider employing specified analytical methods to further explore the casual relationships among various influencing factors and improve understanding of the mechanisms by which these factors impact high-speed rail infrastructure. Lastly, due to limited resources and capabilities, the selected influencing factors in this study may not be comprehensive. In future research, we can incorporate more reliable data, including media reports, government documents, and bidding information, to avoid overfitting caused by insufficient data. Additionally, we can explore methods for optimizing the model’s parameters and include HSR attribute factors to further enhance the model’s accuracy.

Potential for application

This study applies a robust and effective machine-learning method for assessing the diverse defect risks inherent in China’s high-speed railway infrastructure. The practicality of the Random Forest method is not limited to specific geographic regions or infrastructure types; its powerful data processing capability and the ability to identify complex relationships between environmental factors grant it broad application potential. For instance, it can accommodate adjustments in environmental variables, such as rainfall and temperature variation, to suit various climatic zones (e.g., tropical, temperate, polar). Furthermore, this method can be applied to datasets for different infrastructures including roads, bridges, and tunnels, taking into account their unique risk factors and challenges. By fine-tuning the inputs to the algorithm, it is possible to precisely predict the specific risks faced by these different infrastructures, thereby providing a scientific basis for the design, construction, and maintenance of infrastructure.

Policy recommendations

Global climate change, marked by temperature increases, intensified precipitation, and extreme events, threatens HSR safety and reliability, affecting infrastructure and surrounding environments49,50,51. Particularly vulnerable regions like Minle County and Menyuan Hui Autonomous County (Fig. 4), with seasonal frozen soils, face heightened subgrade defects due to disrupted thermal equilibrium. To address these concerns, several policy recommendations are proposed.

First, research and development efforts for HSR infrastructure should be intensified, focusing on enhancing resilience to climate change through developing materials and technologies that can withstand extreme weather conditions. Real-time monitoring and early warning of the seasonal frozen soil environment in the regions housing specific HSR projects, such as Lanzhou-Urumqi and Harbin-Dalian HSRs, should be strengthened. This aims to timely grasp the changes in the frozen soil environment and provide scientific basis for safe and stable operation of HSR projects. In the Far Eastern Railway in Russia, long-term monitoring of subgrade deformation, weather, and rock layers on railway sections located in permafrost areas has been implemented to mitigate the effects of extreme atmospheric precipitation52.

Moreover, the design standards of HSR projects should be revised to accommodate frozen soil climate conditions. Construction processes and methods should be optimized to ensure the safety and reliability of HSR projects in frozen soil areas during construction and operation. Emergency response plans and risk assessment systems for HSR must be established and enhanced in response to climate change. This includes augmenting early warning and response capabilities for extreme weather events, and effectively respond to the sudden risks brought about by climate change. Similarly, in Norway, a preparedness framework has been developed to assess and manage natural climate risks, aiming to reduce railway vulnerability and enhance resilience against the negative impacts of climate change. This includes emergency plans for trains include speed restrictions in high-risk areas and providing alternative transportation methods when tracks are obstructed53.

Lastly, strengthening safety promotion in areas along the HSR line should be emphasized. The government should fully utilize online methods such as government websites, television broadcasting, and new media, as well as offline methods such as home visits and setting up prominent warning signs, to proactively promote policies and regulations related to protecting the safety environment along the HSR line and reducing anthropogenic interference with HSR safety. In Sweden, particularly regarding the Varberg Railway, a study highlighted that human-induced groundwater extraction increases the risk of railway subsidence, suggesting the need for enhanced safety management measures along the railway lines54.

Conclusions

This study quantitatively assesses the multi-subgrade defect risk in China’s HSR infrastructure, utilizing machine learning and historical defect occurrence data. Key environmental factors influencing subgrade defects, such as rainfall, freezing days, extreme temperature, land use, slope, and altitude, are identified, providing valuable insights for HSR planning. Furthermore, spatial analysis further reveals the distribution characteristics of different defects across various regions in China, particularly pointing out high-risk areas like the Menyuan Hui Autonomous and Minle County sections of the Lanzhou-Urumqi HSR, which require increased attention and preventative measures to minimize potential losses and ensure operational continuity.

For high-risk areas and types of defects, we recommend intensifying R&D efforts for HSR projects to develop materials and technologies capable of withstanding extreme weather conditions; optimizing design standards and construction methods for HSR projects, especially under permafrost climate conditions; establishing and improving emergency response plans and risk assessment systems for HSR to address sudden risks posed by climate change; and enhancing safety promotion along HSR lines to reduce human interference and ensure the safe and stable operation of HSR. While focused on China’s HSR, the methods are adaptable to railway infrastructure risk assessment globally, with challenges remaining in incorporating engineering design characteristics and evolving climate change impacts. Further research is needed to address these challenges.