Mapping wind erosion hazard with regression-based machine learning algorithms

Gholami, Hamid; Mohammadifar, Aliakbar; Bui, Dieu Tien; Collins, Adrian L.

doi:10.1038/s41598-020-77567-0

Download PDF

Article
Open access
Published: 24 November 2020

Mapping wind erosion hazard with regression-based machine learning algorithms

Hamid Gholami¹,
Aliakbar Mohammadifar¹,
Dieu Tien Bui^2,3 &
…
Adrian L. Collins⁴

Scientific Reports volume 10, Article number: 20494 (2020) Cite this article

3841 Accesses
34 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Land susceptibility to wind erosion hazard in Isfahan province, Iran, was mapped by testing 16 advanced regression-based machine learning methods: Robust linear regression (RLR), Cforest, Non-convex penalized quantile regression (NCPQR), Neural network with feature extraction (NNFE), Monotone multi-layer perception neural network (MMLPNN), Ridge regression (RR), Boosting generalized linear model (BGLM), Negative binomial generalized linear model (NBGLM), Boosting generalized additive model (BGAM), Spline generalized additive model (SGAM), Spike and slab regression (SSR), Stochastic gradient boosting (SGB), support vector machine (SVM), Relevance vector machine (RVM) and the Cubist and Adaptive network-based fuzzy inference system (ANFIS). Thirteen factors controlling wind erosion were mapped, and multicollinearity among these factors was quantified using the tolerance coefficient (TC) and variance inflation factor (VIF). Model performance was assessed by RMSE, MAE, MBE, and a Taylor diagram using both training and validation datasets. The result showed that five models (MMLPNN, SGAM, Cforest, BGAM and SGB) are capable of delivering a high prediction accuracy for land susceptibility to wind erosion hazard. DEM, precipitation, and vegetation (NDVI) are the most critical factors controlling wind erosion in the study area. Overall, regression-based machine learning models are efficient techniques for mapping land susceptibility to wind erosion hazards.

A multi-hazard map-based flooding, gully erosion, forest fires, and earthquakes in Iran

Article Open access 21 July 2021

A machine learning framework for multi-hazards modeling and mapping in a mountainous area

Article Open access 22 July 2020

Suggestion for a new deterministic model coupled with machine learning techniques for landslide susceptibility mapping

Article Open access 23 March 2021

Introduction

Wind erosion, as an environmental problem, has many adverse effects on the economics of societies and the health of terrestrial and marine ecosystems^1,2,3. Therefore, predicting land susceptibility to wind erosion hazards such as dust emissions from land surfaces is essential for mitigating its effects. Literature review shows that different tools and techniques have been proposed for investigating different aspects of wind erosion and its consequences, uniquely identifying regions prone to generating sediments for wind erosion, including remote sensing, data mining, and sediment fingerprinting^4,5,6,7. However, these techniques require intensive field sampling with expensive laboratory analyses⁸, and as a result, they are not efficient for large spatial domains.

Recently, together with developments of geospatial technology and computer sciences, machine learning (ML) has received considerable attention with many successful applications in the spatial mapping of different environmental hazards such as land subsidence, gully erosion, landslides, and dust provenance, as well as mapping of soil properties (microbial dynamics, moisture, shear strength, soil taxa, bulk density, total nitrogen, organic carbon). However, to the best of our knowledge, exploration of the utility of advanced ML techniques in predicting land susceptibility to wind erosion has not been undertaken.

Typical ML models applied to date in different areas of environmental research include decision tree and linear equation models, the particle swarm optimization-adaptive network-based fuzzy inference system (PANFIS), genetic algorithms, support vector regression (SVR), artificial neural networks (ANN), hybrid models, random forest (RF), Wang and Mendel's (WM), partial least square regression (PLSR), principal component regression (PCR), Cubist, Bayesian additive regression trees (BART), radial basis function (RBF), extreme gradient boosting (XGBoost) and regression tree analysis^{8,9,10,11,12,13,14,15}. Since, to date, a comprehensive study applying regression-based ML models to mapping wind erosion hazard has not been investigated, there remains a need for such work since wind erosion hazards are a major socio-economic challenge for some parts of the world. Accordingly, this work aimed to address this gap in the existing literature by providing a comprehensive assessing of the prediction performance of 16 regression-based ML models (robust linear regression (RLR), Cforest, non-convex penalized quantile regression (NCPQR), neural network with feature extraction (NNFE), monotone multi-layer perception neural network (MMLPNN), ridge regression (RR), boosting generalized linear model (BGLM), negative binomial generalized linear model (NBGLM), boosting generalized additive model (BGAM), spline generalized additive model (SGAM), spike and slab regression (SSR), stochastic gradient boosting (SGB), support vector machine (SVM), relevance vector machine (RVM), Cubist and adaptive network-based fuzzy inference system (ANFIS)) for mapping land susceptibility to the wind erosion hazard in the Isfahan province, central Iran. Using this case study, we provide more generic recommendations.

Results

Multicollinearity test

Table1 shows the values of the tolerance coefficient (TC) and the variance inflation factor (VIF) for the controlling factors for wind erosion. VIF > 10 and TC < 0.1 indicate multicollinearity among the effective factors. Based on our results, the lowest TC value was obtained for electrical conductivity (EC), while the highest VIF value (5.93) value was calculated for bulk density. The results indicated the absence of any multicollinearity between the 13 factors controlling wind erosion in the study area.

Table 1 Values of the TC and VIF for examining multicollinearity among the effective factors for wind erosion using the training dataset.

Full size table

Relative importance of the factors affecting wind erosion

The model with the highest performance (MMLPNN) was applied to quantify the relative importance of the effective factors for wind erosion. Based on Fig. 1, three factors, DEM (with relative importance 0.95), precipitation (with relative importance 0.8), and NDVI (with relative importance 0.54), were recognized as the most important factors controlling wind erosion in the study area. Wind erosion has been shown to be affected by many factors such as wind, precipitation, temperature, soil properties (texture, composition, and aggregation), topography, aerodynamic roughness, vegetation, and land use practice¹⁶.

Discussion

Maps of wind erosion hazard

The wind erosion hazard maps generated by 16 individual ML models are presented in Figs.2, 3, and 4. Table 2 indicates the area (percentage and km²) of the four land susceptibility classes (low, moderate, high, and very high) for wind erosion hazard estimated by the 16 ML models. Based on the results of all 16 models, areas of land susceptibility to the low susceptibility class ranged between 15.5% (RVM and BGLM models) and 32.8% (MMLPNN model). The minimum and maximum areas of moderate land susceptibility to wind erosion were estimated by the SGB (0.6%) and SSR (15.7%) models, respectively. The area of land categorized into the high susceptibility class ranged from 1.2% (MMLPNN model) to 20.2% (NCPQR model). Corresponding areas assigned to the very high class of land susceptibility to wind erosion hazard ranged from 41% (NBGLM model) to 65.2% (SGB).

Table 2 Land susceptibility classes to wind erosion hazard calculated by 16 individual ML models.

Full size table

Model performance assessment

Model performance for mapping wind erosion hazard was assessed by three indices (MAE, MBE, and RMSE; (Fig. 5)). Additionally, a Taylor diagram for both the training and evaluation datasets were constructed (Fig. 6). MMLPNN was selected as the most accurate model for mapping wind erosion hazard, while according to the RMSE and MAE, NBGLM was the weakest predictive model, and NCPQR was recognized as the overall worst model.

Based on all three statistical indicators of model performance and the Taylor diagram for the evaluation dataset, five models (MMLPNN, SGAM, Cforest, BGAM, and SGB) returned low errors. SSR and NBGLM had the lowest accuracies among the 16 models. Based on the Taylor diagram drawn for the training dataset, five models (MMLPNN, Cforest, SGAM, SGB and NNFE) were identified as the most accurate predictive ML models for mapping wind erosion hazard in the study area, whereas NBGLM and RVM were the weakest predictive models.

Overall, MMLPNN, SGAM, Cforest, BGAM, and SGB were identified as the most accurate models for predicting land susceptibility to wind erosion. Based on MMLPNN (Fig. 2e), the four susceptibility classes covered 32.8%, 1.1%, 1.2% and 64.9% of the total area of Isfahan province, respectively. The land susceptibility map to wind erosion hazard generated using SGAM shows the high and very high susceptibility classes covered 5.4% and 61.5% of the total area, respectively, whereas the low and moderate susceptibility classes occupied 27.4% and 5.6%, respectively (Fig. 3d). According to Cforest (Fig. 2b), 26%, 6.4%, 6.6%, and 61% of the total area belonged to the low, moderate, high and very high susceptibility classes, respectively. Using BGAM (Fig. 3c), the very high susceptibility class covered 62% of the study area, whereas the low, moderate, and high classes occupied 23.2%, 7.8% and 7% of the total area, respectively. Finally, in the case of the SGB model (Fig. 3f), the results classified 32%, 0.6%, 2.2% and 65.2% of the study area as low, moderate, high, and very high susceptibility, respectively.

The map of wind erosion hazard produced by MMLPNN is the most accurate. Overall, multi-layer perception networks (MLPS) as universal estimators are well-known techniques for system identification. The monotonicity of MMLPNN does not depend on the quality of the training data because it is guaranteed by its structure¹⁷. GAM with spline function (SGAM) was one of the 5 most accurate models for wind erosion hazard mapping. The spline functions allow the flexible representation of non-linear marginal relationships of the explanatory and response variables without the necessity to define a specific function¹⁸. Cforest, as a random forest (RF) model, uses conditional inference trees for prediction¹⁹. Several studies confirm the performance of RF as a suitable model for spatial predictions of environmental hazards. For example²⁰, reported that the RF model is the best model for digital mapping of soil carbon fractions.

Some studies²¹ have also argued that RF has the highest predictive capability for modelling landslide susceptibility in comparison with other ML models. Some previous studies²² have also reported that in comparison with other methods, RF has better performance in estimating PM_2.5 monthly concentration. In this study, we applied the boosting with generalized additive model (BGAM), and based on the indicators for examining model performance, this model exhibited satisfactory performance and was selected as one of the five most accurate models for mapping wind erosion hazard. Boosting is a technique for improving prediction rules, and it can be applied to classification and regression methods to increase the accuracy of the predictions²³. SGB is related to both boosting and bagging^24,25. Previous research²⁶ has reported that SGB provides stable predictions for tree species presence.

Conclusions

This research assessed the performance of 16 individual regression-based ML algorithms for mapping land susceptibility to wind erosion hazard in an arid region in central Iran. In all, 13 effective factors for wind erosion were considered and regions with active wind erosion were mapped using a "wind erosion inventory map". Based on three statistical indicators and a Taylor diagram, the MMLPNN model was the most accurate model. We conclude that MMLPNN is powerful tool for mapping wind erosion hazard in arid and semi-arid region ecosystems worldwide. We recommend that future work should focus on testing and comparing the performance of regression-based and classification-based ML models for the mapping and spatial modelling of wind erosion and dust sources to ensure that robust evidence is provided to support management decisions.

Material and methods

Study area

Isfahan province (Fig. 7), an arid region, is located in central Iran, between the latitudes 30°45′59.51" to 34°27′13.27" N, and between the longitudes 49°41′53.86" to 55°30′13.67" E. It is experiencing intensive wind erosion on the southeastern side (Segzi plain) and its northern parts. Based on a digital elevation model (DEM), there is high variability in altitude with maximum and minimum elevations ranging between 686 m (in the northern part of the study area and southern parts of Dasht-e-Kavir) to 4398 m (in the vicinity of the Dena Mountain in the southwestern part of the study area). The average annual precipitation ranges between 72 mm (in the eastern part with a corresponding annual mean temperature of 18 °C) and 320 mm (in the western part with an average annual temperature of 13 °C).

Factors controlling wind erosion

Different environmental and climatic factors are controlling wind erosion phenomena in drylands. Environmental variables affecting wind erosion include soil properties, lithology, land use, vegetation cover, topography, and elevation^1,8,27. Previous research²⁸ introduced a local wind erosion climatic index based on the wind speed and effective precipitation index developed by²⁹ for applying in the Chepil wind erosion equation (WEQ).

Soil characteristics

Seven soil characteristics (e.g., available water content (AWC) (Fig. 8a), bulk density (Fig. 8b), calcium carbonate percentage (Fig. 8c), electrical conductivity (EC) (Fig. 8d), exchangeable sodium percentage (ESP)(Fig. 8e), organic carbon content (OCC)(Fig. 8f) and soil texture (Fig. 9a)) were extracted from the world soil map³⁰ and mapped by interpolation in ArcGIS 10.4.1. It should be noted that a total of 803 points (Fig. 7) were used for generating spatial maps.

Lithology and land use

Lithology (Fig. 9b) and land use (Fig. 9c) were mapped spatially based on the maps produced by the Forests, Rangelands, and Watershed Management Organization of Iran (FRWMOI).

Vegetation cover

The normalized difference vegetation index (NDVI) (Fig. 9d)³¹ as the most common index used for the spatial mapping of vegetation cover was applied in our study. NDVI is the difference between the red (RED) and near-infrared (NIR) band combination divided by the sum of the red and near-infrared band combination (Eq. 1).

$$ {\text{NDVI }} = \, \left( {{\text{NIR}}_{{{\text{b4}}}} {-}{\text{ RED}}_{{{\text{b3}}}} } \right) \, / \, \left( {{\text{NIR}}_{{{\text{b4}}}} + {\text{ RED}}_{{{\text{b3}}}} } \right) $$

(1)

Elevation

A digital elevation model (DEM) (Fig. 9e) for the study area was generated using shuttle radar topography mission (SRTM) images with a 30*30 m resolution⁸.

Climatic variables

Wind speed (Fig. 9f) and precipitation (Fig. 10a) were used as climatic factors affecting wind erosion. The spatial maps of these variables were generated based on the daily average wind speed and total annual precipitation data from 23 meteorological stations located in the Isfahan province. All spatial maps of factors controlling wind erosion were generated in ArcGIS 10.4.1.

Inventory map of wind erosion

An inventory map shows regions with active three-stage processes, comprising detachment, transportation, and sedimentation due to wind erosion. An inventory map is needed for predicting land susceptibility to wind erosion hazard. We used a map of regions with active wind erosion produced by the Forest, Rangeland and Watershed Management Organization of Iran (FRWMOI) (Fig. 10b). Based on the inventory map, wind erosion active regions covered ~ 10,961 km² (440 pixels) in the study area. Pixels with active wind erosion were randomly selected for the training (70% or 308 pixels) and validation (30% or 112 pixels) datasets for the ML models (Fig. 10c). Based on field work and FRWMOI, inventory map of wind erosion was generated in ArcGIS 10.4.1.

Multicollinearity among the factors controlling wind erosion

The tolerance coefficient (TC) (Eq. 2) and variance inflation factor (VIF) (Eq. 3) ^8,15,32 were applied to examine multicollinearity among the factors for wind erosion in the Isfahan province.

$$ {\text{TC }} = { 1 }{-}{\text{ R}}^{{2}} $$

(2)

$$ {\text{VIF }} = \left[ {\frac{1}{{{\text{TC}}}}} \right] $$

(3)

where R² is the regression coefficient. If the TC is < 0.1 and the VIF is > 10, both coefficients signify a multicollinearity problem.

Background of the ML algorithms used

This section briefly describes the 16 individual regression-based ML algorithms, which were adopted for mapping wind erosion hazard. These algorithms are available in the caret package, in R software.

Robust linear regression (RLR)

Robust regression is designed to overcome some limitations of traditional parametric and non-parametric methods. Available robust regression methods include M-estimates³³, R-estimates³⁴, least median of squares (LMS) estimates³⁵, least trimmed squares (LTS) estimates and S-estimates³⁶, generalized S-estimates (GS-estimates)³⁷ and MM-estimates³⁸. We used a robust linear regression model with M-estimates for predicting land susceptibility to wind erosion.

Cforest

Random forest (RF), introduced by³⁹, is the most popular method for regression and classification in decision tree learning⁴⁰. RF makes a large number of decision trees in the training phase, and then by averaging the output values of the trees, the output of the model is finalized. Cforest is a type of RF commonly applied for prediction purposes¹⁹.

Non-convex penalized quantile regression (NCPQR)

Quantile regression (QR) has gained considerable attention in different fields of modelling since the work of⁴¹. In comparison with mean regression (MR), QR provides an alternative that is more efficient when the error term follows a non-normal heavy-tailed distribution⁴². We used a penalized QR with a non-convex function⁴² for mapping wind erosion hazard.

Neural networks (NN)

NN can accurately approximate complicated non-linear input/output relationships⁴³. The NN structure includes a set of interconnected units or neurons that estimates the non-linear correlations between each variable. The input neurons or predictor variables are connected to a single or multiple layers of hidden neurons, which are then linked to the output neurons⁴⁴. We used a NN with the feature extraction algorithm (NNFE)⁴⁵ and a monotone multi-layer perception neural network (MMLPNN)⁴⁶ for mapping wind erosion hazard. The feature extractors used textural features based on the spatial relationships between pixels⁴⁵.

Ridge regression with variable selection

Ridge regression (RR), which was proposed by⁴⁷, is expressed as follows (Eq. 4):

$$L\left(w\right)=\sum_{i=1}^{n}{\left({y}_{i}- {\tilde{y}}_{i}\right)}^{2}=\sum_{i=1}^{n}{({y}_{i}-w. {x}_{i})}^{2}$$

(4)

Given a set of n vectors, x₁, … , x_n in R^m, where m is the number of properties, and the dependent variable y_i ∈ R, i = 1, …, n, the objective is to minimize the loss function, i.e., the discrepancy between the real values y_i and the predicted values ỹ_i = w.x.

We applied a RR model with a kernel function⁴⁸ as follows:

$$ {\tilde{y}}=f\left(x\right)=\sum_{i=1}^{n}{\ss }_{i} K \left(x, {x}_{i}\right)$$

(5)

where $K \left(x, {x}_{i}\right)$ is the kernel function and β_i is the weighting.

Generalized linear models (GLMs)

GLMs have been applied to a wide range of research⁴⁹. GLMs have three components, comprising an observation model, a linear predictor, and an invertible link function⁵⁰. Using boosting with GLMs can improve prediction accuracy²³. We applied two GLMs; boosting GLM (BGLM) and negative binomial GLM (NBGLM)⁵¹.

Generalized additive models (GAMs)

GAMs⁵² can be expressed as follows:

$$e\left({\mu }_{i}\right)= {Z}_{i}^{*}.\ss + \sum_{j}{f}_{j} ({x}_{ij})$$

(6)

with

$${\mu }_{i}=E \left({Y}_{i}\right), and {Y}_{i} \sim EF \left({\mu }_{i}, \varnothing \right),$$

(7)

where ${Y}_{i}$ is the ith value of the response variable from an exponential distribution family (EF) with a location parameter (${\mu }_{i})$ and a scale parameter ($\varnothing $),${Z}_{i}^{*}$ indicates the ith row of a parametric model matrix with the vector β, f_j shows unknown functions and ${x}_{ij}$ indicates the ith value of the jth variable. $g\left({\mu }_{i}\right)$ is the link function. We applied two GAMs, comprising boosting (BGAM) and spline (SGAM)¹⁸.

Spike and slab regression (SSR)

SSR is one of the typical variable selection approaches in regression settings, and this model has been applied widely in challenging problems⁵³. SSR was proposed by⁵⁴ and can be expressed as follows⁵³:

$${y}_{i}={\beta }_{\mathrm{1,0}}{x}_{i,1}+\dots + {\beta }_{p,0}{x}_{i,p}+ {\varepsilon }_{i}, i=1, \dots , n,$$

(8)

where (ε_i)_{1≤ i ≤n} are independent random variables such as E(ε_i) = 0 and E (${\varepsilon }_{i}^{2}$) = ${\sigma }_{0}^{2}>0.$ Write X for the n × p design matrix corresponding to (1) and ${\beta }_{0}={({\beta }_{\mathrm{0,1}},\dots , {\beta }_{0,P})}^{T}$ for the true regression parameter. The variables ${x}_{i}={({x}_{i,1},\dots , {x}_{i,p})}^{T}$ and the response-vector $y={({y}_{1},\dots , {y}_{n})}^{T}$ are assumed to the standardized such that:$\sum_{i=1}^{n}{x}_{i,k}=0, \sum_{i=1}^{n}{x}_{i,k}^{2}=n, \sum_{i=1}^{n}{y}_{i}=0.$

Stochastic gradient boosting (SGB)

SGB or gradient boosting machine, proposed by²⁴ is a hybrid algorithm that combines both the advantages of bagging and boosting. This model makes additive regression models by the least-squares at each iteration.

Support and relevance vector machine (SVM and RVM) algorithms

The relevance vector machine (RVM) is a probabilistic sparse kernel model identical in functional form to the support vector machine (SVM). SVM is a very successful approach to supervised learning, and it makes predictions based on the following function⁵⁵:

$$y\left(x\right)=\sum_{n=1}^{m}{w}_{n} K \left(x, {x}_{n}\right)+ {w}_{0},$$

(9)

where ${w}_{n}$ indicates the model weights and K (. , .) is a kernel function. We applied two algorithms, SVM with linear kernel function and RVM with polynomial kernel function.

Cubist

Cubist, a rule-based regression tree algorithm, is based on the M5 theory⁵⁶. This model involves four main steps as follows: (1) growing a tree by branching data, (2) developing the model, (3) pruning the tree, and (4) smoothing the tree⁵⁷.

Adaptive network-based fuzzy inference system (ANFIS)

This model has been applied in different sciences. ANFIS works based on the fussy if/then rules⁵⁸:

$$ {\text{Rule 1}}:{\text{ if }}\left( {{\text{x is A}}_{{1}} } \right){\text{ and }}\left( {{\text{y is B}}_{{1}} } \right){\text{ then }}\left( {{\text{f}}_{{1}} = {\text{ p}}_{{1}} {\text{x }} + {\text{ q}}_{{1}} {\text{y }} + {\text{ r}}_{{1}} } \right) $$

(10)

$$ {\text{Rule 2}}:{\text{ if }}\left( {{\text{x is A}}_{{2}} } \right){\text{ and }}\left( {{\text{y is B}}_{{2}} } \right){\text{ then }}\left( {{\text{f}}_{{2}} = {\text{ p}}_{{2}} {\text{x }} + {\text{ q}}_{{2}} {\text{y }} + {\text{ r}}_{{2}} } \right) $$

(11)

where x and y are as input parameters for FIS, f as FIS output, A and B are fuzzy sets, and p, q, and r are parameters.

In all 16 models, the predicted values for pixels ranged between 0–1. Therefore, we can divide susceptibility predictions into four classes (low (0–0.25), moderate (0.25–0.50), high (0.50–0.75) and very high (0.75–1)).

Assessment of model performance

In order to evaluate model performance in predicting land susceptibility to wind erosion hazard in the study area, three statistical methods comprising root mean square error (RMSE), mean absolute error (MAE)^59,60 and mean bias error (MBE) were used:

$$RMSE=\sqrt{\frac{\sum_{i=1}^{m}({v}_{k}-{v}_{p}{)}^{2}}{m}}$$

(12)

$$MAE= \frac{\sum_{i=1}^{m}\left|{v}_{k}-{v}_{p}\right|}{m}$$

(13)

$$MBE= \frac{1}{m} \sum_{i=1}^{m}({v}_{k}- {v}_{p})$$

(14)

where m is number of the observations, ${v}_{k}$ and ${v}_{p}$ indicate the measured and predicted values, respectively. Also, a Taylor diagram was applied as a further test for assessing the performance of individual regression-based ML models¹⁴.

Prioritization of the factors controlling wind erosion

Among the 16 ML models tested, a model with the lowest error (RMSE, MAE, and MBE) was applied to quantify the relative importance of the factors controlling wind erosion. In this study, MMLPNN had the lowest error (with RMSE, MAE, and MBE < 0.002%) and was therefore applied for determining the relative importance of the factors for wind erosion.

A brief overview of the main steps used in our methods is presented in Fig. 11.

References

Prospero, J. M., Ginoux, P., Torres, O., Nicholson, S. E. & Gill, T. E. Environmental characterization of global sources of atmospheric soil dust identified with the Nimbus 7 Total Ozone Mapping Spectrometer (TOMS) absorbing aerosol product. Rev. Geophys. 40(1), 1–31 (2002).
Article Google Scholar
Goossens, D. On-site and off-site effects of wind erosion. In Wind Erosion on Agricultural Land in Europe (ed. Warren, A.) 29–38 (Luxembourg, European Commission, 2003).
Google Scholar
Dahmardeh Behrooz, R., Gholami, H., Telfer, M. W., Jansen, J. D. & Fathabadi, A. Uisng GLUE to pull apart the provenance of atmospheric dust. Aeolian Res. 37, 1–13 (2019).
Collins, A. L., Blackwell, M., Boeckx, P., Chivers, C. A., Emelko, M., Evrard, O., & Harris, P. Sediment source fingerprinting: benchmarking recent outputs, remaining challenges and emerging themes. J. Soils Sedim. 1–34 (2020).
Rashki, A., Kaskaoutis, D. G., Goudie, A. S. & Kahn, R. A. Dryness of ephemeral lakes and consequences for dust activity: The case of the Hamoun drainage basin, southeastern Iran. Sci. Total Environ. 463–464, 552–564 (2013).
Article ADS PubMed CAS Google Scholar
Gholami, H., Rahimi, S., Fathabadi, A., Habibi, S., & Collins, A. L. Mapping the spatial sources of atmospheric dust using GLUE and Monte Carlo simulation. Sci. Total Environ. 138090 (2020).
Schepanski, K., Tegen, I. & Macke. A. Comparison of satellite based observations of Saharan dust source areas. Remote Sens. Environ. 123, 90–97 (2012).
Gholami, H., Mohamadifar, A. & Collins, A. L. Spatial mapping of the provenance of storm dust: Application of data mining and ensemble modelling. Atmos. Res. 233(1), 104716 (2020).
Article Google Scholar
Bondi, G., Creamer, R., Ferrari, A., Fenton, O. & Wall, D. Using machine learning to predict soil bulk density on the basis of visual parameters: Tools for in-field and post-field evaluation. Geoderma 318, 137–147 (2018).
Article ADS Google Scholar
Pham, B. T. et al. A novel artificial intelligence approach based on multi-layer perceptron neural network and biogeography-based optimization for predicting coefficient of consolidation of soil. CATENA 173, 302–311 (2019).
Article Google Scholar
Prasad, R., Deo, R. C., Li, Y. & Maraseni, T. Soil moisture forecasting by a hybrid machine learning technique: ELM integrated with ensemble empirical mode decomposition. Geoderma 330, 136–161 (2018).
Article ADS Google Scholar
Gholami, H., Mohammadifar, A., Pourghasemi, H. R., & Collins, A. L. A new integrated data mining model to map spatial variation in the susceptibility of land to act as a source of aeolian dust. Environ. Sci. Pollut. Res. 1–18 (2020).
Jha, S. K. & Ahmad, Z. Soil microbial dynamics prediction using machine learning regression methods. Comput. Electron. Agric. 147, 158–165 (2018).
Article Google Scholar
Gholami, H., Mohamadifar, A., Sorooshian, A. & Jansen, J. D. Machine-learning algorithms for predicting land susceptibility to dust emissions: The case of the Jazmurian Basin, Iran. Atmos. Pollut. Res. 11, 1303–1315 (2020).
Article CAS Google Scholar
Pourghasemi, H. R., Yousefi, S., Kornejady, A. & Cerda, A. Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. Sci. Total Environ. 609, 764–775 (2017).
Article ADS CAS PubMed Google Scholar
Shao, Y. Physics and modelling of wind erosion. Atmos. Oceanogr. Sci. Library 37, 459 (2008).
Google Scholar
Lang., B. Monotonic multi-layer perceptron networks as universal approximators. In International Conference on Artificial Neural Networks (ICANN), 31–37 (2005).
Gerling, L., Löschau, G., Wiedensohler, A. & Weber, S. Statistical modelling of roadside and urban background ultrafine and accumulation mode particle number concentrations using generalized additive models. Sci. Total Environ. 134570 (2019).
Hagenauer, J., Omrani, H. & Helbich, M. Assessing the performance of 38 machine learning models: The case of land consumption rates in Bavaria, Germany. Int. J. Geogr. Inf. Sci. 33(7), 1399–1419 (2019).
Article Google Scholar
Keskin, H., Grunwald, S. & Harris, W. G. Digital mapping of soil carbon fractions with machine learning. Geoderma 339, 40–58 (2019).
Article ADS CAS Google Scholar
Chen, W. et al. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA 151, 147–160 (2017).
Article Google Scholar
Xu, Y. et al. Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM_2.5. Environ. Pollut. 242, 1417–1426 (2018).
Sutton, C. D. Classification and regression trees, bagging, and boosting. Handb. Stat. (Elsevier) 24, 303–329 (2005).
Article Google Scholar
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001).
Article MathSciNet MATH Google Scholar
Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).
Article MathSciNet MATH Google Scholar
Moisen, G. G. Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods. Ecol. Model. 199, 176–187 (2006).
Article Google Scholar
Saadoud, D., Hassani, M., Peinado, F. J. M. & Guettouche, M. S. Application of fuzzy logic approach for wind erosion hazard mapping in Laghouat region (Algeria) using remote sensing and GIS. Aeol. Res. 32, 24–34 (2018).
Article Google Scholar
Chepil, W. S., Siddoway, F. H. & Armbrust, D. V. Climate factor for estimating wind erodibility of farm fields. J. Soil Water Conserv. 17, 162–165 (1962).
Google Scholar
Thornthwaite, C. W. An approach towards a rational classification of climate. Geogr. Rev. 38, 55–94 (1948).
Article Google Scholar
IUSS-WRB. World Reference Base for Soil Resources 2014, Update 2015. International Soil Classification System for Naming Soils and Creating Legends for Soil Maps. World Soil Resources Reports No. 106 (FAO, Rome, 2015).
Lamchin, M. et al. Assessment of land cover change and desertification using remote sensing technology in a local region of Mongolia. Adv. Space Res. 57, 64–77 (2016).
Article ADS Google Scholar
Bui, D. T., Pradhan, B., Lofman, O., Revhaug, I. & Dick, O. B. Spatial prediction of landslide hazards in Vietnam: A comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. CATENA 96, 28–40 (2012).
Article Google Scholar
Huber, P. J. Robust Statistics (Wiley, New York, 1981).
Book MATH Google Scholar
Jackel, L. A. Estimating regression coefficients by minimizing the dispersion of the residuals. Ann. Math. Stat. 5, 1449–1458 (1972).
Article MathSciNet MATH Google Scholar
Siegel, A. F. Robust regression using repeated medians. Biometrika 69, 242–244 (1982).
Article MATH Google Scholar
Rousseeuw, P. & Yohai, V. Robust regression by means of S-estimators. Robust and non-linear time series. in (J. Franke, W. Hardle, R. D. Martin eds.) Lectures Notes in Statistics Vol. 26, 256–272 (Springer, New York, 1984).
Croux, C., Rousseeuw, P. J. & Hossjer, O. Generalized S-estimators. J. Am. Stat. Assoc. 89, 1271–1281 (1994).
Article MathSciNet MATH Google Scholar
Yohai, V. J. High breakdown-point and high efficiency robust estimates for regression. Ann. Stat. 15, 642–656 (1987).
Article MathSciNet MATH Google Scholar
Breiman, l. Random forest. Mach. Learn. 45, 5–32 (2001).
Srivastava, R., Tiwari, A. N. & Giri, V. K. Solar radiation forecasting using MARS, CART, M5, and random forest model: A case study for India. Heliyon 5(10), e02692 (2019).
Article PubMed PubMed Central Google Scholar
Koenker, R. & Bassett, G. Regression quantiles. Econometrica 46, 33–50 (1978).
Article MathSciNet MATH Google Scholar
Ma, H., Li, T., Zhu, H. & Zhu, Z. Quantile regression for functional partially linear model in ultra-high dimensions. Comput. Stat. Data Anal. 129, 135–147 (2019).
Article MathSciNet MATH Google Scholar
Krasnopolsky, V.M. & Chevallier, F. Some neural network applications in environmental sciences. Part II: Advancing computational efficiency of environmental numerical models. Neural Netw. 16, 335–348 (2003).
Heung, B. et al. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma 265, 62–77 (2016).
Article ADS Google Scholar
Horn, Z. C., Auret, L., McCoy, J. T., Aldrich, C. & Herbst, B. M. Performance of convolutional neural networks for feature extraction in forth flotation sensing. IFAC-PapersOnLine 50(2), 13–18 (2017).
Article Google Scholar
Canon, A.J. Multi-Layer Perception Neural Network with Optional Monotonicity Constraints. Package (2017).
Hoerl, A. E. & Kennard, R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 50 (1970).
MATH Google Scholar
Saunders, C., Gammerman, A. & Vovk, V. Ridge regression learning algorithm in Dual variables. in Proceeding ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning, 515–521. San Francisco, CA, USA (1998).
Agostinelli, C., Valdora, M. & Yohai, V. J. Initial robust estimation in generalized linear models. Comput. Stat. Data Anal. 134, 144–156 (2019).
Article MathSciNet MATH Google Scholar
Hosack, G. R., Hayes, K. R. & Barry, S. C. Prior elicitation for Bayesian generalised linear models with application to risk control option assessment. Reliab. Eng. Syst. Saf. 167, 351–361 (2017).
Article Google Scholar
Shirazi, M., Lord, D., Dhavala, S. S. & Geedipally, S. R. A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data. Accid. Anal. Prevent. 91, 10–18 (2016).
Article Google Scholar
Hastie, T. J. & Tibshirani, R. J. Generalized additive models. Stat. Sci. 1(3), 297–310 (1986).
Article MathSciNet MATH Google Scholar
Ishwaran, H. & Rao, J. S. Consistency of spike and slab regression. Stat. Probab. Lett. 81, 1920–1928 (2011).
Article MathSciNet MATH Google Scholar
Lempers, F. B. Posterior Probabilities of Alternative Linear Models (Rotterdam University Press, Rotterdam, 1971).
MATH Google Scholar
Tipping, E. The relevance vector machine. in NIPS Proceeding (2000).
Quinlan, R. Learning with continuous classes. in Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, 16–18 November 1992; 343–348 (1992).
Nguyen, H., Bui, X. N., Tran, Q. H. & Mai, N. L. A new soft computing model for estimating and controlling blast-produced ground vibration based on hierarchical K-means clustering and cubist algorithms. Appl. Soft Comput. J. 77, 376–386 (2019).
Article Google Scholar
Jang, J. S. R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. 23(3), 665–685 (1993).
Google Scholar
Gholami, H., Jafari TakhtiNajad, E., Collins, A. L. & Fathabadi, A. Monte Carlo fingerprinting of the terrestrial sources of different particle size fractions of coastal sediment deposits using geochemical tracers: some lessons for the user community. Environ. Sci. Pollut. Res. 26, 23206 (2019).
Article Google Scholar
Fan, M., Hu, J., Cao, R., Ruan, W. & Wei, X. A review on experimental design for pollutants removal in water treatment with the aid of artificial intelligence. Chemosphere 200, 330–343 (2018).
Article ADS CAS PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank the Faculty of Agriculture and Natural Resources, University of Hormozgan, Iran, for supporting this joint research project. Input by ALC was supported by the UKRI (UK Research and Innovation) Biotechnology and Biological Sciences Research Council (BBSRC) via grant award BBS/E/C/000I0330.

Author information

Authors and Affiliations

Department of Natural Resources Engineering, University of Hormozgan, Bandar-Abbas, Hormozgan, Iran
Hamid Gholami & Aliakbar Mohammadifar
Institute of Research and Development, Duy Tan University, Da Nang, 550000, Vietnam
Dieu Tien Bui
GIS Group, Department of Business and IT, University of South-Eastern Norway, 3800, Bø i Telemark, Norway
Dieu Tien Bui
Sustainable Agriculture Sciences, Rothamsted/Research, North Wyke, Okehampton, EX20 2SB, Devon, UK
Adrian L. Collins

Authors

Hamid Gholami
View author publications
You can also search for this author in PubMed Google Scholar
Aliakbar Mohammadifar
View author publications
You can also search for this author in PubMed Google Scholar
Dieu Tien Bui
View author publications
You can also search for this author in PubMed Google Scholar
Adrian L. Collins
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.G. conceived the original idea of the research. Modelling work was undertaken by H.G. and A.M. H.G., D.T.B. and A.C. co-wrote the manuscript.

Corresponding authors

Correspondence to Hamid Gholami or Dieu Tien Bui.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gholami, H., Mohammadifar, A., Bui, D.T. et al. Mapping wind erosion hazard with regression-based machine learning algorithms. Sci Rep 10, 20494 (2020). https://doi.org/10.1038/s41598-020-77567-0

Download citation

Received: 06 January 2020
Accepted: 10 November 2020
Published: 24 November 2020
DOI: https://doi.org/10.1038/s41598-020-77567-0

This article is cited by

Estimating the girth distribution of rubber trees using support and relevance vector machines
- Bambang Hendro Trisasongko
- Dyah Retno Panuju
- Nur Etika Karyati
Applied Geomatics (2024)
Slope-scale landslide susceptibility assessment based on coupled models of frequency ratio and multiple regression analysis with limited historical hazards data
- Jianfeng Sun
- Tiesheng Yan
- Hui Xu
Natural Hazards (2024)
Aeolian and fluvial processes influence on dust storms of Hormuz Strait and Makran coastal plains (SE Iran); insight from geomorphic landforms, and sediment texture and mineralogy
- Ali Mohammadi
International Journal of Earth Sciences (2023)
High-resolution, spatially resolved quantification of wind erosion rates based on UAV images (case study: Sistan region, southeastern Iran)
- Saeed Poormorteza
- Hamid Gholami
- Navazollah Moradi
Environmental Science and Pollution Research (2022)
Assessment of the effect of climate change on the health status of Atrak watershed in Northeastern of Iran
- Reza Chamani
- Mehdi Vafakhah
- Somayeh Zare
Arabian Journal of Geosciences (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.