Introduction

Water resources shortages are caused by climatic variability and change, population growth, and mismanagement posing challenges to meeting the water requirements in many countries1,2. Water resources management involves hydraulic and hydrologic issues and must consider social and economic conditions. Early hydro-social research of water systems relied heavily on geographic assessments and introduced methods for understanding the feedbacks between water and human systems3. Hydro-social studies are based on recognizing the close interactions between human systems and water, the social and cultural meanings of water, and how they relate to water systems and water management options4. The hydro-social cycle, focusing on the feedback systems between human and water interactions, recognizes the human impact on the hydrological cycle as part of the dialectical development of water systems and social systems5.

There are several definitions of water scarcity and different interpretations of its meaning. At the basic level water scarcity is governed by its quantity and distribution and by natural and human factors2. Human populations are affected by water stresses and per capita shortages of renewable water. Water scarcity is generally considered as a global challenge for humanity6. Human actions have caused a diverse set of water sustainability challenges that must be addressed by new approaches to water management7. Research has shown that water scarcity is the result of physical water scarcity and the result of complex interactions between water resources and social phenomena6. In addition to the social and economic crises of water, it has been emphasized that the water problem is not simply about scarcity but a crisis of water management. Today's goal in managing water systems is to define new interdisciplinary solutions. Hydro-social science studies the interactions between human factors, water flows, hydraulic technologies, biophysical elements, socio-economic structures, and cultural-political institutions in the management of water systems8. This science evaluates water resource systems considering human influences such as withdrawals, impoundments, and other human-induced changes in hydrological systems9. This means that water shortages and the existence of adverse trends in hydrological systems also affect human interests, and, ultimately, social wellbeing10. Therefore, modeling and predicting these relationships is useful for understanding the hydro-social interactions for more efficient management of water resources and improved societal assessments.

Hydro-social indicators are useful measures of the relative magnitude of phenomena11, and they support decision making in water resources management. These indicators are evaluated to assess the effectiveness of current policies and future management activities and investment decisions based on factors such as the definition or perception of water among stakeholders, the primary mechanisms of water perception by stakeholder groups (e.g., social, historical, economic, political, spiritual, etc.), the feedbacks between water and society that create a "water community", and the consequences of hydro-social interactions between stakeholders and their Water12,13. Understanding hydro-social relationships raises awareness about best practices for water use14.

The importance and indivisible relationship between social parameters and comprehensive water resources management is nowadays well understood. Understanding the feedbacks between social parameters and per capita renewable water and its modeling yields predictions about countries' future social evolution. Soft computing methods and artificial-intelligence-based methods such as Artificial Neural Network (ANN), adaptive neuro-fuzzy inference system (ANFIS), gene expression programming (GEP), Multivariate Adaptive Regression Splines (MARS), the M5 Tree model, Support Vector Machines (SVM), Random Forests (RF), and Multi Linear Regression (MLR) methods have been successfully employed in water quality and quantity modeling. These data-based prediction methods have been used to predict various phenomena in different fields, including water resources management.

Hydro-social science evaluates the relationship between water science and social science and discovers mathematical functions that predict phenomena in these two sciences. Hydro-social research stems mainly from the convergence of political ecology and technology studies3. Ross et al.3 argued that most studies on water resources systems in the context of hydro-social issues are related to irrigation, water scarcity, dams, groundwater, desalination, glaciers, sanitation, and mining. Carey et al.15 introduced social sciences in the hydrological modeling of glacial basins. They studied glacier melting caused by climate change in the Santa River Basin, Peru, employing five variables: political agendas and economic development, governance (laws and institutions), technology and engineering, land and resource use, societal responses. The latter authors presented a hydrological modeling tool in hydro-social science to understand the impact of climate change on glacier shrinkage that affects the human population. Chen et al.16, applied ordinary least squares (OLS) and geographic weight regression (GWR) models to identify the effect of land use and population density indices on surface water quality in wet and dry seasons in the Wen-Rui Tong River basin in eastern China. Their results revealed that the impact of these indicators varies with the spatial and seasonal scales. Suburban and rural areas were identified with urban land as the primary influencing factor concerning pollutants during the wet season, while agricultural land was identified as a more prevalent influencing factor during the dry season. Ženko et al.6 investigated the effect of water shortage on water users' mental health based on gender and age group in Iran's Urmia Lake basin, and determined that water scarcity adversely affect the economy, social relations, and people's health. At the same time, all these factors threaten the mental health of water users. The latter authors evaluated hydro-social factors and showed that water problems lead to biophysical, financial, and social changes that impact the health of water users due to chronic psychological stress, social isolation, intra-community conflicts, despair, hopelessness, depression, and anxiety. Shrestha et al.17 analyzed competition and conflict over water scarcity in the Kathmandu Valley, Nepal, and showed human distress due water insecurity and the inability to integrate political, social, and economic factors to allow access to water services and institutions. Devkota et al.18 applied hydrological analysis and flood modeling in the West Rapti River basin by a community survey of 240 households based on public perceptions. They examined flood adaptation strategies that had already occurred or are likely to occur in the future and applied a hybrid hydro-social approach to demonstrate the importance of flood plans to raise local flood awareness. Weigleb et al.19 evaluated the path from MDG to SDG to achieve the Sixth Sustainable Development Goal (SDG 6). The key factors in this respect are the management of problem sources and not their effects, increasing integration of issues and sectors, inclusion of environmental goals, more flexible management approaches, participation and collaborative decision-making, more attention to managing human behavior through "soft measures", open and shared information systems, and incorporation of learning cycles. They considered the "hydro-social cycle" a concept to connect society and the vital water element. Bui et al.20 assessed groundwater resources' social sustainability in Hanoi, Vietnam, concerning three main groundwater characteristics (quantity, quality, and management). The sustainability indices, quantity, quality, and management of groundwater were estimated good, poor, and acceptable with the values 0.68, 0.27, and 0.52, respectively, which resulted in Hanoi being rated at an acceptable level with the value of 0.49 for the social sustainability assessment. Pande et al.21, exploiting the theory of metabolism, explored the relationship between birth rate and local water consumption by considering the virtual water content, virtual water trade, and agricultural production at the 7-continent scale. They investigated whether the average rate of human metabolism controls or is controlled by per capita water consumption, and reported that continents with relatively low birth rates, including North America, Europe, and Oceania, feature relatively high per capita water consumption, while developing regions exhibit an opposite pattern of association. Diaz et al.22 implemented the Driver-Pressure-State-Impact-Response (DPSIR) to explore the association between river ecosystems and the social system of the Biobío Basin in Chile. 65 indicators whose data spans over a period of 35 years were selected for assessing the DPSIR in the study area. The trend analysis results indicate a significant reduction in biodiversity, the deterioration of regulatory services and non-material goods for human well-being, while cultural services, direct and indirect pressures, and institutional responses increased. Forouzani et al.2 applied the Q method (a social sciences technique useful for discerning views, opinions, beliefs, attitudes) to identify farmers' and agricultural experts' understanding of agricultural water poverty and its causes in Marvdasht city, Iran. They surveyed the traits of agricultural water poverty with the Q method to identify four distinct types of farmers (management-adherents, adaptive-adherents, fatalists, and support seekers), and three types of agricultural specialists (farmer blamer pessimists, technocratic realists, and optimists). Li et al.23 examined the impact of various socio-economic activities on Lake Tai's water quality in China and demonstrated that severe ecological pressures from repeated and intense socio-economic activities can lead to the decline of the ecological functions of lakes and threaten aquatic organisms' health. Their results indicate a significant association between the average annual concentration of total nitrogen (TN), total phosphorous (TP), chemical oxygen demand (COD), biological oxygen demand (BOD), population, per gross domestic product (GDP), and sewage discharge. Several other studies have reported social-science and soft computing applications to water resources investigations7,24,25,26,27. On the other hand, the soft-computing methods literature is too vast to be reviewed in this paper; therefore, only a small set of references is herein highlighted that have applied soft computing methods.

Various researchers have examined the inclusion of social indicators in assessing water resources to understand how and which of the social parameters have the most significant impact on the water system; however, few have uncovered the patterns of association that govern hydro-social indicators quantitatively on a broad scale. It is essential to consider the rationality of the statistical association between social parameters and water resource parameters, and the level of interaction between these factors deserves further research.

This paper develops functions of worldwide application for water-resources factors within the context of hydro-social science. With respect to the current state-of-the-art in hydro-social science the innovations of this work are: (1) application of soft-computing methods (i.e., Artificial Neural Network, adaptive neuro-fuzzy inference system, and gene expression programming) for linking hydro-social science and water science; (2) estimation of several social variables (rural society, society, technology and communication, and knowledge) in function of the water resources of the continents, and estimation of water resources in terms of social variables; (3) mathematical functions for social parameters are shown to be scalable in space and time.

Methodology

Selected indicators

The renewable water per capita (RWPC) is chosen as the overall indicator of water resource status. The indicators corresponding to rural society, urban society, technology and communication, and knowledge are the ratio of rural to urban population (PRUP), population density (PD), number of internet users (IU), and education index (EI), respectively; each of them is defined below.

  • Renewable water per capita (RWPC): Renewable water is the amount of water that a basin can replenish during the annual water cycle. Per capita renewable water is the available volume of renewable water per person every year measured in millions of cubic meters per person.

  • The ratio of rural to urban population (PRUP): This index compares the number of people living in rural areas to the number of people living in urban areas (a rural population division into the urban population). The ratio of rural to urban population (PRUP) is herein proposed as a potential indicator of the water resources' status.

  • Population density (PD): Population density measures the number of inhabitants per unit area. The unit of this parameter is persons per square kilometer. Several authors have applied population density as an indicator of the status of water resources28,29,30,31,32,33,34,35,36,37,38,39,40.

  • Internet users (IU): The number of people who have access to the Internet and use it for their daily work. This indicator is effective in mass information related to water use. The unit of this parameter is the percentage of the internet-using population with respect to the total population. The term "Water Internet" is reminiscent of water use and internet connectivity. The Water Internet is a source of water supply information for involved organizations and citizens in general22,34,39.

  • Education index (EI): The educational level is a leading determinant of a person’s knowledge about the use of water resources. The Education index (EI) is calculated as the average years of schooling received by a population of individuals34.

Figure 1 displays the phases of this paper's methodology. This work proposes the social indicators PRUPPDIU, and EI to quantify the RWPC in Africa, America, Asia, Europe, and Oceania with soft computing methods (ANN, ANFIS-SC, and GEP). This paper analysis evaluated two types of functional patterns of associations: (1) per capita data on water resources were applied as input and values of social parameters were quantified as output; (2) social parameters were applied as input and per capita water resources parameter were quantified as output. The components of this paper’s methodology are shown in Fig. 2.

Figure 1
figure 1

The phases of this work’s methodology.

Figure 2
figure 2

The components of this paper’s methodology.

The per capita renewable water data and social indicators were normalized for each country with Eq. (1). Normalized values range between 0 and 1.

$$ X_{N} = \frac{{(X_{i} - X_{\min } )}}{{(X_{\max } - X_{\min } )}} $$
(1)

where XNXiXmin, and Xmax denote the normalized value, the real value, minimal value, and the maximal value, respectively. Normalization operations are performed before the modeling process so that the algorithm fairly examines the various dimensions of the databased on the same standardized range. This work implements a normalization process with Eq. (1) for training and testing41,42.

This study selected countries for analysis that exhibit a decreasing trend of per capita renewable water in the 13-year study period (2005–2017). The studied countries' names classified by continent are listed in Table 1. Africa, America, Asia, Europe, and Oceania include 48, 35, 43, 20, and 5 countries across numerous climate conditions, respectively. This study randomly chose 70% of each continent's countries for model training and 30% for model testing. This work relied on data extracted from the Knoema database (https://knoema.com). The statistical characteristics such as average, standard deviation, and coefficient of variation of the hydro-social indicators are listed in Table 2. The coefficients of variation of per capita renewable water equaled 0.663, 0.654, 0.683, 0.329, and 0.640 for Africa, America, Asia, Europe, and Oceania, respectively. Europe has the smallest coefficient of variation of PRUP, PD, IU, and EI indicators. The effect of the Köppen climatic classification was herein considered to examine the patterns of association between hydro-social indicators. Figure 3 displays the Köppen climate classification of the world. The Köppen climate classification scheme43 divides the climates into five main groups (A, B, C, D, and E); each group can be further classified by precipitation and temperature conditions.

Table 1 Countries included in this study in the training and testing periods.
Table 2 The statistical characteristics of hydro-social indicators.
Figure 3
figure 3

The Köppen climate classification of the world. (Zone A: tropical or equatorial zone; Zone B: arid or dry zone; Zone C: warm/mild temperate zone; Zone D: continental zone; Zone E: polar zone)59.

Soft-computing models

Soft computing methods are effective in detecting new and valuable information from large datasets with the purpose of discovery, classification, and forecasting44. The well-known ANNs, ANFIS, and GEP are described in the next section.

ANNs

Artificial neural networks (ANNs) are computing systems inspired by biological neural networks. The initial aim of neural networks was to solve complex problems mimicking the human mind. Over time ANNs' focus shifted to emulating specific mental abilities. An ANN is based on a set of connected units or nodes, called artificial neurons (similar to biological neurons in the animal brain). Any synapse between neurons can transmit a signal from one neuron to another. The receiving neuron can process the signals. In conventional ANN the synapse signal is a real number, and a nonlinear function of its inputs calculates each neuron's output. Neurons and synapses apply weights that are adjusted as learning progresses. This weight increases or decreases the signal strength that it sends to the synapse. Neurons are commonly organized in layers. The signals travel from the first layer (input) to the last layer (output), and they may travel multiple times45,46,47. Three fundamental characteristics of ANNs for determining an optimal solution are: (1) the applied algorithm, (2) the activation functions, and (3) the neurons, as follows:

  1. 1.

    The applied algorithm and layer characteristics: The Levenberg–Marquardt (LM) algorithm with three-layer has been selected for use in this study due to its faster convergence in training networks. The error propagation algorithm changes the network weights and bias values so that the activation function decreases more rapidly.

  2. 2.

    Activation Functions: Selecting the activation function has a significant effect on the accuracy of the network output. There are three main activation functions for neural network modeling: the Logsig, Tansig, and Purelin functions. Several activation functions are used to develop the network to achieve the best combination of activation functions in a network with one to three hidden layers. The log-sig, tan-sig, and pure-line functions were applied in the hidden and output layers. Using one or a combination of these activation functions (between layers) may lead to an optimal model with the highest correlation value and the smallest error.

  3. 3.

    The neuron number determination: Determining the number of hidden layers to create a network with the least error in predicting the desired outputs is essential. Trial-and-error is the best way to determine the optimal number of neurons in the hidden layer of ANN models48,49,50. The number of neurons in the lattice layer has a significant effect on the neural network's function. Using a small number of neurons prevents the neural network from learning most of the patterns accurately. On the other hand, the presence of a large number of neurons leads to the preservation of patterns and thus prevents the neural network from learning to recognize their basic features.

The authors wrote the ANN program code in MATLAB.

ANFIS

A neural-fuzzy inference system is an artificial neural network based on the Takagi–Sugeno fuzzy system51, which is in accordance with the set of fuzzy rules if–then that learns to identify nonlinear fitting functions. ANFIS is a universal estimator52, which is herein applied. If–Then fuzzy rules are required to specify functions between the fuzzy variables of a fuzzy system. Equations (2) and (3) give a typical rule set with two fuzzy If–Then rules in a first order Sugeno system:

$$ {\text{Rule}}\,1:\,{\text{If}}\,{\text{x}}\, = {\text{A}}1\,{\text{and}}\,{\text{y}} = {\text{B}}1,\,{\text{then}}\,{\text{f}}1 = {\text{p}}1{\text{x}} + {\text{q}}1{\text{y}} + {\text{r}}1 $$
(2)
$$ {\text{Rule}}\,2:\quad {\text{If}}\,x = {\text{A}}2\,{\text{and}}\,{\text{y}} = {\text{B}}2,\,{\text{then}}\,{\text{f}}1 = {\text{p}}2{\text{x}} + {\text{q}}2{\text{y}} + {\text{r}}2 $$
(3)

where A1 (LOW), A2 (LOW), and B1 (HIGH), B2 (MEDIUM) denote the membership functions (MFs) for inputs x and y, respectively, the importance of the clusters’ number of ANFIS-SC is determining the efficient radius amount based on the trial- and-error method. The radius varies from 0.20 to 0.6042.

The ANFIS method with Subtractive Clustering (ANFIS-SC) is herein applied. Subtractive Clustering (SC) is an extension of the mountain clustering method proposed by Yager and Filev53, in which the data are clustered by evaluating the potential of data in the specification space54. Linear least squares (LLS) are applied to determine the MF's output, following previous works42,50,55 These authors wrote the code for ANFIS-SC in MATLAB.

GEP

Gene expression programming is a method of mathematical modeling based on evolutionary computation and inspired by natural evolution. This method was introduced by Ferreira56 in 1999 and advanced in 2001. The GEP algorithm integrates the dominant view of the two predecessor inheritance algorithms to resolve their weaknesses. GEP features a chromosome genotype similar to a genetic algorithm (Genetic Algorithm), and the phenotype of a chromosome has a tree structure with length and size variable similar to the genetic programming algorithm56. Design and implementation steps of GEP are: (1) Defining the fitness function; (2) Defining the terminals and functions; (3) Determining the structure of chromosomes (number of generations, number, and length of genes); (4) Determining the Linking Function of Genes; (5) Specifying the operators, and executing the algorithm56. The fundamental characteristic of GEP for determining the optimal model or fitting functions are the mutation rate (MR), the inversion rate (IR), the IS transposition rate (ISTR), and the RIS transposition rate (RISTR), one-point recombination rate (OPRR), two-point recombination rate (TPRR), gene recombination rate (GRR), and gene transposition rate (GTR) whose values are listed in Table 3. The penalizing tool with parsimony pressure was applied in this study, which. implements several mathematical functions to predict the hydro-social indicators \( ( + , - ,*,/,\ln x,e^{x} ,x^{2} ,x^{3} ,\sqrt x ,\sqrt[3]{x},\sin x,\cos x,\arctan (x)) \).

Table 3 The GEP model parameters.

This study first employs the terminal set for the renewable water per capita indicator with output sets containing PRUP, PD, IU, and EI, and then employs the terminal sets for the cited social indicator with output renewable water per capita indicator. This study employs the soft-computing software GeneXpro Tools 4.0.

Evaluating model performance

The goodness-of-fit correlation coefficient (R), root mean squared error (RMSE), and mean absolute error (MAE) were applied to evaluate the model's performance. The RRMSE, and MAE are respectively defined as follows:

$$ R = \left[ {\frac{{\sum\limits_{i = 1}^{N} {(hydrosocio_{io} - \overline{hydrosocio}_{o} )(hydrosocio_{ie} - \overline{hydrosocio}_{e} )} }}{{\sqrt {\sum\limits_{i = 1}^{N} {(hydrosocio_{io} - \overline{hydrosocio}_{o} )^{2} \sum\limits_{i = 1}^{N} {(hydrosocio_{ie} - \overline{hydrosocio}_{e} )^{2} } } } }}} \right] $$
(4)
$$ RMSE = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {(hydrosocio_{io} - hydrosocio_{ie} )^{2} } } $$
(5)
$$ MAE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {hydrosocio_{io} - hydrosocio_{ie} } \right|} $$
(6)

where \(\overline{hydrosocio}_{o}\) and \(\overline{hydrosocio}_{e}\) denote the average observed and estimated hydro-social indicators’ values respectively, hydrosocioio and hydrosocioie denote the observed and estimated hydro-social indicators’ values respectively, and N denotes the number of data.

The correlation coefficient (R) measures the degree of statistical association (positive or negative) between variables. The RMSE measures the goodness of fit, giving higher weight to high values of observations. by comparing the estimated values and the observed values. The MAE measures the distribution of goodness of fit at moderate values57. The models’ performances are optimal if the R and RMSE are closer to 1 and 0, respectively. This study also employs various graphical methods to display models’ results.

Results and discussion

Evaluating indicators

Among the selected parameters the ratio of rural to the urban population directly relates to the per capita renewable water, whereas the population density, internet users, and education index exhibit an inverse relation with the per capita renewable water worldwide. It means the per capita renewable water decreases with decreasing rural to urban population and increasing population density, internet users, and education index. The urban population has increased in developing regions, which feature increasing population density. People's health is threatened by poor urban sanitary infrastructure leading to disease and social decay. Increasing population density and a reduction in per capita renewable water inflict social harm and disrupt society's economic growth58. Population density also is positively related to the relative number of elderly and social vulnerability because potential casualties increase with population size40. On the other hand, with the increase of Internet users and education index, the per capita renewable water has increased. As long as the knowledge and awareness of communities improved, the consumption algorithm decreased, leading to a reduction of renewable water per capita. Therefore, the level of literacy and knowledge for a community can be the basis for making the right decisions in agriculture, health, natural resource management, and other activities related to water resources for decision-makers. The latter situation calls for better communication among water users through social media and improved education to learn and develop optimal water management.

Evaluating models and developing hydro-social equations

Three soft-computing approaches, namely ANN-LM, ANFIS-SC, and GEP, were applied to develop predictive equations with social indicators worldwide. The ANN-Levenberg–Marquardt (LM) backpropagation algorithm with one hidden layer was applied, and the hidden nodes’ number was determined by trial and error. A hybrid algorithm was combined with the ANFIS-SC models. There is no rule for determining the radii values of the ANFIS-SC models. The final radii values were determined by trial-and-error.

The numbers of neurons in the ANN-LM models and the radii values of the ANFIS-SC models are listed in Table 4. The activation functions of the output nodes were linear for all the continents. The activation functions of the hidden nodes of the ANN-LM models for the P1 through P4 indicators were respectively the tangent sigmoid, tangent sigmoid, tangent sigmoid, and logarithm sigmoid for Africa; the activation functions of the proportion of rural to urban population was the tangent sigmoid for all the continents. Table 5 lists the results of the soft computing optimal models' estimates of the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI), denoted respectively by P1 through P4, during the test period in the world's continents. Figures 4 and 5 display the characteristics of ANN (the number of neurons and activation functions of hidden and output layers) and ANFIS-SC (radii values) models, respectively. The values of R and RMSE for Africa corresponding to the ANN-LM models were respectively (0.921, 0.981, 0.858, 0.862) and (0.193, 0.058, 0.190, 0.172) associated with the PRUPPDIU, and EI parameters, respectively. The values of R and RMSE for Africa corresponding to the ANFIS-SC models equaled respectively (0.933, 0.991, 0.868, 0.891) and (0.130, 0.044, 0.186, 0.156) for the P1 through P4 parameters, respectively. Concerning the GEP models, the root relative squared error (RRSE) was selected as the pressure tree's fitness function. The values of RMSE for GEP models equaled (0.084, 0.029, 0.178, 0.135), (0.197, 0.056, 0.152, 0.163), (0.151, 0.036, 0.123, 0.210), (0.182, 0.039, 0.148, 0.204) and (0.141, 0.030, 0.226, 0.082) for Africa, America, Asia, Europe, and Oceania, respectively. Table 5 results for the RRMSE, and MAE values establish the GEP model estimates of PRUPPDIU, and EI indicators had the highest R values and the lowest RMSE values. The average R values of the best models (GEP) for all selected social parameters equaled 0.942, 0.909, 0.910, 0.889, and 0.947 for Africa, America, Asia, Europe, and Oceania, respectively. These results indicate the climatic characteristics of the continents influence the performance of the models. The models' performances for Africa and Oceania associated with the type B dominant Koppen climate classification was the best. The models' performances for Asia and America that have similar climatic classification were nearly equal. The average model performance for Europe in the type D climate classification was the poorest among the continents.

Table 4 The characteristics of ANN (the number of neurons) and ANFIS (radii values) models corresponding to social indicators and continents.
Table 5 The results of soft computing optimal models corresponding to the testing period in the world’s continents.
Figure 4
figure 4

The characteristics of optimal ANN models; showing the number of neurons and activation functions of hidden and output layers.

Figure 5
figure 5

The characteristic of optimal ANFIS-SC model showing the radii values.

Figures 6, 7, 8, 9 and 10 show the observed and estimated social parameters obtained with the soft-computing models during the test period in Africa, America, Asia, Europe, and Oceania, respectively. Figure 11 compares the R, RMSE, and MAE values from the soft-computing models. The R values for soft-computing models are close to 1, with the quality relations being: RGEP > RANFIS-SC > RANN-LM for all social indicators. Figure 11 establishes that the ANFIS-SC model exceeded the ANN-LM models’ performance. Also, the GEP models had better performance than the ANFIS-SC and ANN-LM for estimating the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI) parameters in Africa, America, Asia, Europe, and Oceania.

Figure 6
figure 6

Observed and estimated social parameters during the testing period in Africa.

Figure 7
figure 7

Observed and estimated social parameters during the testing period in America.

Figure 8
figure 8

Observed and estimated social parameters during the testing period in Asia.

Figure 9
figure 9

Observed and estimated social parameters during the testing period in Europe.

Figure 10
figure 10

Observed and estimated social indicators during the testing period in Oceania.

Figure 11
figure 11

Comparison of R, RMSE and MAE values corresponding to the soft computing methods.

The main advantage of the GEP over other soft computing methods (e.g., ANFIS and ANN) is in producing predictive equations. The equations obtained with the optimal models for the social indicators (i.e., the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI) in Africa, America, Asia, Europe, and Oceania) are listed in Table 6. The equations that the GEP model discovers as a structure do not necessarily correspond to reality. The equations listed in Table 6 merely show the optimal equations extracted from the model after the evolution, for all indicators and in all basins (considering renewable water per capita as a decision variable).

Table 6 Mathematical equations governing hydro-social indicators.

The performance of the GEP models in estimating the social indicators in three ranges of values, namely, 20% of the maximum estimated values (20%max), 60% of median estimated values (60%mid or 20%min to 20%max), and 20% of minimum estimated values (20%min), during the test period for the proportion of rural to urban population (PRUP), population density (PD), internet users (IU) and the education index (EI) parameters of Africa, America, Asia, Europe, and Oceania are listed in Table 7. Table 7’s results indicate there is not a regular rule to determine the best-cited ranges performances. The education index and the population density have the lowest and highest R values among the other parameters in the three different ranges (20%max, 60%mid, and 20%min) in Africa, America, Asia, Europe, and Oceania. Therefore, the results indicate a strong pattern of association between the population density parameter and water resources status in all continents of the world.

Table 7 The performance of GEP models with respect to selected ranges.

Figure 12 depicts the distribution of estimated data values of the social parameters (i = 1, 2, 3, 4) and their comparison through the continents. The box plots are a graphic display integrating multiple numerical relations. One approach to understanding the distribution or dispersion of data is through the box diagram, which is based on the "minimum," "first quartile-Q1(0.25%)", "median (0.50%)", "third quartile-Q3(0.75%)" and "maximum" statistical indicators. Figure 12 shows Oceania and Africa exhibit the smallest and largest values of the rural to urban population, respectively. America has the lowest values of the first to the third quartile. The estimated population density value in Europe has the most values in the third quartile (0.75%). The median values of estimated internet users have the smallest and largest values in Africa and Europe, respectively. America has the lowest values of the first quartile, median, third quartile, and maximum values associated with the estimated education index values among the continents.

Figure 12
figure 12

Distribution of estimated data values of social indicators (Pi, i = 1, 2, …, 4).

The summary of hydro-social equations performance is listed in Table 8, where it is seen the best models’, performances are such that PD > PRUP > EI > IUPD > IU > EI > PRUPPD > IU > PRUP > EIPD > PRUP > IU > EI and PD > EI > IU > PRUP for Africa, America, Asia, Europe, and Oceania, respectively.

Table 8 Summary of hydro-social equations performance.

This paper’s results indicate the pattern of association between social parameters and water resources is complex. Renewable water per capita was estimated using social indicators PRUP, PD, IU, and EI based on gene expression programming. The results of GEP to estimate RWPC corresponding to the testing period in the world’s continents as listed in Table 9. The values of RMSE for optimal GEP models equaled 0.089, 0.058, 0.042, 0.049, and 0.036 for Africa, America, Asia, Europe, and Oceania, respectively. Figure 13 displays the observed and estimated RWPC parameter during the test period in the world’s continents. The equations obtained with the optimal models for the renewable water per capita in Africa, America, Asia, Europe, and Oceania are listed in Table 10. The fitted equations can be applied at variable spatial and temporal scales. The derived equations imply that water resources in Africa and Oceania are governed by the PRUP, PD, IU, and EI indicators. Also, the PRUP, PD, and IU indicators in Europe and PD and IU indicators in America and Asia have the most influence on their water resources status. The association between social parameters and water resources in all continents is variable. The linking of these social indicators with the per capita renewable water is a function of the countries' cultural and economic conditions, thus bearing on the future management and policymaking across continents. This study’s results concerning hydro-social indicators are consistent with the findings by Forouzani et al.2, Carey et al.15, Lima et al.25, Pande et al.7, Diep et al.26, and Diaz et al.22.

Table 9 The results of GEP estimating RWPC corresponding to the testing period in the world’s continents.
Figure 13
figure 13

Observed and estimated RWPC parameters during the test period in the world’s continents.

Table 10 Mathematical equations governing hydro-social indicators.

This paper’s results establish the importance of examining the interactions between climate, the status of water resources, and social indicators. The state and social conditions of a country reflect the status of its water resources. Therefore, this study has shown how significant an impact the management and planning of a country can have on its water resources. Each successful water resources project rests on a successful social setting.

Concluding remarks

One of the most critical issues in the water resources systems is its social context, which poses many challenges in systems analysis. Therefore, the impact of social indicators on water issues, and vice versa, is crucial. On the other hand, direct measurement of social indicators and their evaluation in water resources management is complicated, time-consuming, and expensive. Therefore, it would be beneficial to implement development plans, and management measures better if we can estimate social indicators and water indicators and their interrelationship. Therefore, studies on hydro-social indicators would help us predict social changes related to water resources and vice versa in management plans to face fewer consequences. Social conditions may preclude meeting water-resources planning objectives. Therefore, such planning must be socially grounded for its success. This study assesses several social indicators, i.e., the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI) worldwide, and concludes these indicators have a high correlation with the per capita renewable water. These social indicators must be considered in water policy decisions and planning for sustainable water management and planning. It is concluded the modeling the association of hydro-social indicators with the per capita renewable water using the soft-computing methods is viable and insightful. The performance criteria of the GEP models performed better than those of the ANFIS-SC and ANN-LM models for the world’s continents. This paper shows it is possible to estimate the water status and social indicators of a society based on hydro-social equations developed when there is a paucity of information about the social status. These estimates are useful for water resources management. This study has shown a successful application of hybrid soft computing to determine functional relations between socio-economic parameters and water and environmental resources parameters.

Recommendations

Some specific recommendations for improving future research are as follows:

  • Examining other quantitative and qualitative social indicators.

  • Examining other indicators such as environmental, economic, cultural, and political indicators.

  • Applying other models of soft computing methods for exploring hydro-social relationships.

  • Applying soft computing methods in examining the interrelationships between social indicators and water resources indicators to specific issues of water management, such as flood and drought management.