A machine learning framework for multi-hazards modeling and mapping in a mountainous area

This study sought to produce an accurate multi-hazard risk map for a mountainous region of Iran. The study area is in southwestern Iran. The region has experienced numerous extreme natural events in recent decades. This study models the probabilities of snow avalanches, landslides, wildfires, land subsidence, and floods using machine learning models that include support vector machine (SVM), boosted regression tree (BRT), and generalized linear model (GLM). Climatic, topographic, geological, social, and morphological factors were the main input variables used. The data were obtained from several sources. The accuracies of GLM, SVM, and functional discriminant analysis (FDA) models indicate that SVM is the most accurate for predicting landslides, land subsidence, and flood hazards in the study area. GLM is the best algorithm for wildfire mapping, and FDA is the most accurate model for predicting snow avalanche risk. The values of AUC (area under curve) for all five hazards using the best models are greater than 0.8, demonstrating that the model’s predictive abilities are acceptable. A machine learning approach can prove to be very useful tool for hazard management and disaster mitigation, particularly for multi-hazard modeling. The predictive maps produce valuable baselines for risk management in the study area, providing evidence to manage future human interaction with hazards.

Scientific RepoRtS | (2020) 10:12144 | https://doi.org/10.1038/s41598-020-69233-2 www.nature.com/scientificreports/ from 28 meteorological stations, digital stream layers, and 895 piezometric wells. These data were obtained from the Regional Water Company of Chaharmahal and Bakhtiari. The social factors were extracted from road networks and residential areas mapped on 1:25,000 topographic maps. The vegetation factors were discerned from Landsat 8 OLI images from June 2018. In addition, to evaluation of the importance of the effective factors for each hazard, specific factors were selected for modeling specific hazards: 12 for wildfires, 8 for snow avalanches, 12 for landslides, 12 for land subsidence, and 12 for floods.
Application of machine learning models. Three state-of-the-art machine learning models were applied in present study to construct the hazard risk maps. Each is explained below.
Functional discriminant analysis (FDA). FDA creates a statistical method to analyze effective factors. It can generally be said that models based on discrimination do unsupervised work so that each class is subdivided into its own subclass; each subclass is given a special value 71,72 . The FDA model is a special combination of regression www.nature.com/scientificreports/ models that implements a hidden process for each class in the modeling process, especially when conducting complex class modelin 73,74 . The FDA model is similar to other statistical methods, so it can perform just as well 75 . But, since the FDA model is nonparametric, it has been used in a wide range of fields 76 . The FDA model is new to analyses of data, but it has been convenient to use it as a replacement for functions. Therefore, more attention should be paid to this method 77 .
Generalized linear model (GLM). The GLM is regression-based so it can reveal differences between variables 78 . The GLM is created from several linear models, and it constructs a best regression model that can predict multiple events [79][80][81] . Some researchers have reported that GLM is most often used for spatial modeling 55,[82][83][84][85] . In general, the GLM uses multiple regression to increase accuracy and quality of the results because it can establish a very clear relationship between the dependent and independent variables 86 .
Support vector machine (SVM). SVM uses both classification and regression, based on the concept of controlled learning. Results have shown that it generates the smallest clustering errors 87 . Since this model's approach is based on statistical learning theory, it reduces errors and identifies the optimal response 88 . SVM indicates performance estimation by answering a convex optimization problem 89,90 . The SVM model provides a very important advantage: it identifies and analyzes layers effectively 91 .
Multi-hazards risk mapping. Snow-avalanche hazard (SAH), landslide hazard (LH), wildfire hazard (WFH), land-subsidence hazard (LSH), and flood hazard (FH) maps were created from the effective factors with the three machine learning models (Fig. 4). First, susceptibility to each hazard was created according to the dependent variables (locations of landslides, floods, avalanches, etc.) and some effective factors (the independent variables) using machine learning techniques. Next, the models with the highest accuracies, determined from ROC-AUC values, were selected and used for multi-hazard mapping. These models were integrated using a Boolean algorithm based on four classes for each hazard-low, moderate, high, and very high. A review of the literature 44,45 indicated that susceptibility classes of low and moderate were low hazard (0) conditions and high and very high were deemed high hazard (1) conditions. To facilitate integration, the four-class maps produced for each hazard by the best models (from among the three algorithms) were reassigned these two classes: 0 and 1. The maps of the five natural events (flood, landslides, land subsidence, snow avalanches, and wildfires) were combined to create an integrated multi-hazard (MH) map (i.e., MH = SAH + LH + LSH + WFH + FH) in ArcGIS and the result was reclassified (Fig. 5). www.nature.com/scientificreports/ Accuracy assessment. The accuracy of each of the MH maps was assessed with the training group data (for the goodness-of-fit test) and the validation group data (for the predictive-performance test) using area under the curve (AUC). AUC is a scalar measure that is a threshold-independent method 92,93 . An area of 1 represents perfect classification, while an area of 0.5 or less indicates poor classification of locations by a model 45,[94][95][96] . In the present study, to produce multi-hazard susceptibility maps of snow avalanches, land subsidence, wildfires, www.nature.com/scientificreports/ landslides, and flood by GLM, FDA, and SVM models a special package was applied in the R software version R 3.5.3. The packages used were "svm" 60,97 , "glm" 55,63 , and "fda" 17,66 .

Results
Accuracy assessments of the hazard maps using AUc . Assessing the accuracies of the three machine learning models (Table 2) demonstrated that FDA (for SAH), SVM (LSH), GLM (WH), SVM (LH), and SVM (FH) provided the most accurate models. The values of AUC these five models were all greater than 0.8, indicating strong classification success and confirmed that the models were acceptably accurate.
Integrated multi-hazard (MH) map. The results of the MH map show that the hazards do not overlap (Table 3 and Fig. 5). More than 1/6th (16.51%) of the province is free of all five hazards. Five sixths (83.49%) of Chaharmahal and Bakhtiari Province experiences at least one of the hazards.

Discussion
Arid and semi-arid regions of the world experience extreme natural events that threaten the structures and daily functions of localities 98 . Natural hazards can cause a great deal of economic damages 99 , interruptions, injuries, and loss of life. Mountainous regions are among the most disaster-prone parts of the world because of their geological, climatological, and hydrological characteristics 100,101 .
An effective way to begin to manage natural disasters is to map hazards. The information generated can be very useful for effective planning and management of people and activities. Most natural hazards studies have focused on single hazards. Single-hazard approaches focus on hazards as independent phenomena, ignoring the domain of relationships between the hazards 32 and this may lead to miscalculations of risk [102][103][104][105][106][107] . A greater emphasis on the interactions between and combinations of hazards' risks is needed 102 . Studies that have focused on multi-hazard approaches have concluded that there is collectively greater risk from the interactions of multiple hazards than is yielded by simply combining the results of single-hazard studies. The increasing use of GIS in natural resources management and the introduction of various algebraic, statistical, and empirical methods have enabled better assessments of natural hazards. The methods have been developed in different parts of the world based on different conditions and with different amounts of available data, but they have advanced the modeling process and have revealed the spatial distributions of the natural hazards in many study areas.
Several methods have been used to model and map different natural hazards. For example, flood risk has been assessed using support vector machine (SVM), frequency ratio (FR), multivariate statistical analysis, weight of evidence (WoE), analytic hierarchy process (AHP), and decision trees (DTs). The analytic hierarchy process (AHP) method is one of the most common ways to solve problems associated with the use of multiple www.nature.com/scientificreports/ variables 108,109 and it is often used in hazard assessments 110 . However, mapping processes are very sensitive to changes in expert's judgments and to changes in weighting the input variables at the assessment scale and are significant disadvantages 109 . The most popular methods used in landslide risk assessments are neuro-fuzzy inference systems 111 , logistic regression models, analytic hierarchy process, statistical indices 112 , vector based methods 113 , and artificial neural networks 114 . For wildfire risk assessment, probabilistic models and maximum entropy models 115  www.nature.com/scientificreports/ tree (DT) 126 , the random forest (RF) [127][128][129] , and support vector machine (SVM) 24,130 have been used. Numerous methods have also been used for mapping snow avalanche risk: multi-criteria decision making approaches [131][132][133] , fuzzy-frequency ratio models [134][135][136] , numerical methods, dynamic models 137 , and remote sensing-based methods 138,139 . Though remote sensing can provide useful information about snow avalanches, the complex relationships between snow avalanches and geomorphometric variables are often overlooked, and most risk assessments are based on expert opinion. And prediction of land subsidence risk has used methods like artificial neural networks 140 , frequency ratio 141 , logistic regression 142 , and differential radar interferometry 143 . www.nature.com/scientificreports/ Machine learning is another modeling technique that is increasingly used to understand the complex relationships between a wide range of independent variables like meteorological factors (winds, air pressure, storm surge, and floods) and a dependent variable 144 . Therefore, these algorithms can aid forecasting of multiple hazards simultaneously, where the environmental conditions vary considerably across a landscape 145 .
In this study, we assessed five hazards in a mountainous region of Iran. To comprehensively assess extreme natural events in the study area, multi-hazard mapping was conducted using three machine learning models. Evaluation of the accuracies of the SVM, GLM and FDA models showed that SVM is most accurate when predicting landslide, land subsidence, and flood risks. GLM is most accurate for wildfire risk. And FDA is most accurate for snow-avalanche risk prediciton ( Table 2). The AUCs of the five best models were over 0.8, validating their strong performances 146 and demonstrating that they (more or less) accurately predicted the patterns of the hazards in the study area. The SVM method also produced very good results for mapping landslide, land subsidence, and flood risks. Li et al. 147 applied SVM with univariate and multivariate statistical methods to investigate land subsidence. Their results showed that SVM is more accurate than other algorithms they tested. Others have confirmed the high performance of SVM for similar purposes 24,131,148,149 . Studies of landslide risk have also revealed that highly accurate predictions were made with SVM 112,150 . The strong capacity of SVM to  www.nature.com/scientificreports/ predict flood risk has also been demonstrated [151][152][153] . GLM has been used to predict wildfire risk 123,[154][155][156][157][158] . GLMs have proven to acceptably predict wildfire risks in California 155,159 and Spain 156,160 . The MH risk map was developed by combining the results produced by the SVM, GLM and FDA approaches. Results demonstrate that using the best machine learning models to predict several hazards yields useful information about their interactions. Multi-hazard relationships are very dependent upon the scale of analysis and the specific sets of hazards. Understanding the relationships and interactions between multiple hazards is an important challenge 103 . This study begins to fill this gap. The results show that all five hazards are absent from 16.5% of the study area. The rest of the study area, 83.5%, is likely to be impacted by at least one of the hazards, however. Pourghasemi et al. 54 mapped both the individual and collective risks posed by three hazards (floods, forest fires, and landslides) in a multi-hazard study using machine learning techniques. Others have conducted multi-hazard risk assessments, but separately for each risk 46,47,161 . conclusions As mountainous areas are challenged with a wide array of natural hazards and sites within them are prone to exposures to multiple natural hazards, this study evaluated the spatial distribution of risk from multiple hazards in Chaharmahal and Bakhtiari Province, Iran, using three machine learning models (SVM, GLM and FDA). Identification of high-risk areas is the most important issue for most decision makers and natural resource managers. In this regard, we presented a multi-hazard risk map for five natural hazards (floods, landslides, land subsidence, snow avalanches, and forest fires) in the study area. Evaluation of the accuracies of the maps produced by the SVM, GLM, and FDA models showed that SVM is most accurate model for predicting landslide, land subsidence, and flood risks. GLM is best for wildfire risk prediction. And FDA is best for snow avalanche risk assessment in the region. The results indicate that 16.5% of the study area is not likely to experience any of the five natural hazards, but the rest of province (83.5%) is at risk from exposure to at least one of the five (or several or perhaps all): 11.41% is possesses snow avalanche risk, 11.07% wildfire risk, and 9.83% landslide risk. Each type of machine learning method achieved acceptable levels of accuracy in their predictions. Therefore, these results can be regarded with high confidence and may be used in future studies to examine the spatial distributions of risks from multiple hazards and to provide useful information for proactive management and hazard mitigation.