Assessment of global hydro-social indicators in water resources management

Water is a vital element that plays a central role in human life. This study assesses the status of indicators based on water resources availability relying on hydro-social analysis. The assessment involves countries exhibiting decreasing trends in per capita renewable water during 2005–2017. Africa, America, Asia, Europe, and Oceania encompass respectively 48, 35, 43, 20, and 5 countries with distinct climatic conditions. Four hydro-social indicators associated with rural society, urban society, technology and communication, and knowledge were estimated with soft-computing methods [i.e., artificial neural networks, adaptive neuro-fuzzy inference system, and gene expression programming (GEP)] for the world’s continents. The GEP model’s performance was the best among the computing methods in estimating hydro-social indicators for all the world’s continents based on statistical criteria [correlation coefficient (R), root mean square error (RMSE), and mean absolute error]. The values of RMSE for GEP models for the ratio of rural to urban population (PRUP), population density, number of internet users and education index parameters equaled (0.084, 0.029, 0.178, 0.135), (0.197, 0.056, 0.152, 0.163), (0.151, 0.036, 0.123, 0.210), (0.182, 0.039, 0.148, 0.204) and (0.141, 0.030, 0.226, 0.082) for Africa, America, Asia, Europe and Oceania, respectively. Scalable equations for hydro-social indicators are developed with applicability at variable spatial and temporal scales worldwide. This paper’s results show the patterns of association between social parameters and water resources vary across continents. This study’s findings contribute to improving water-resources planning and management considering hydro-social indicators.

Water resources shortages are caused by climatic variability and change, population growth, and mismanagement posing challenges to meeting the water requirements in many countries 1,2 . Water resources management involves hydraulic and hydrologic issues and must consider social and economic conditions. Early hydro-social research of water systems relied heavily on geographic assessments and introduced methods for understanding the feedbacks between water and human systems 3 . Hydro-social studies are based on recognizing the close interactions between human systems and water, the social and cultural meanings of water, and how they relate to water systems and water management options 4 . The hydro-social cycle, focusing on the feedback systems between human and water interactions, recognizes the human impact on the hydrological cycle as part of the dialectical development of water systems and social systems 5 .
There are several definitions of water scarcity and different interpretations of its meaning. At the basic level water scarcity is governed by its quantity and distribution and by natural and human factors 2 . Human populations are affected by water stresses and per capita shortages of renewable water. Water scarcity is generally considered as a global challenge for humanity 6 . Human actions have caused a diverse set of water sustainability challenges that must be addressed by new approaches to water management 7 . Research has shown that water scarcity is the result of physical water scarcity and the result of complex interactions between water resources and social phenomena 6 . In addition to the social and economic crises of water, it has been emphasized that the water problem is not simply about scarcity but a crisis of water management. Today's goal in managing water systems is to define new interdisciplinary solutions. Hydro-social science studies the interactions between human factors, water flows, hydraulic technologies, biophysical elements, socio-economic structures, and cultural-political institutions in the management of water systems 8 . This science evaluates water resource systems considering human influences such as withdrawals, impoundments, and other human-induced changes in hydrological systems 9 . This means that water shortages and the existence of adverse trends in hydrological systems also affect www.nature.com/scientificreports/ support seekers), and three types of agricultural specialists (farmer blamer pessimists, technocratic realists, and optimists). Li et al. 23 examined the impact of various socio-economic activities on Lake Tai's water quality in China and demonstrated that severe ecological pressures from repeated and intense socio-economic activities can lead to the decline of the ecological functions of lakes and threaten aquatic organisms' health. Their results indicate a significant association between the average annual concentration of total nitrogen (TN), total phosphorous (TP), chemical oxygen demand (COD), biological oxygen demand (BOD), population, per gross domestic product (GDP), and sewage discharge. Several other studies have reported social-science and soft computing applications to water resources investigations 7,[24][25][26][27] . On the other hand, the soft-computing methods literature is too vast to be reviewed in this paper; therefore, only a small set of references is herein highlighted that have applied soft computing methods. Various researchers have examined the inclusion of social indicators in assessing water resources to understand how and which of the social parameters have the most significant impact on the water system; however, few have uncovered the patterns of association that govern hydro-social indicators quantitatively on a broad scale. It is essential to consider the rationality of the statistical association between social parameters and water resource parameters, and the level of interaction between these factors deserves further research.
This paper develops functions of worldwide application for water-resources factors within the context of hydro-social science. With respect to the current state-of-the-art in hydro-social science the innovations of this work are: (1) application of soft-computing methods (i.e., Artificial Neural Network, adaptive neuro-fuzzy inference system, and gene expression programming) for linking hydro-social science and water science; (2) estimation of several social variables (rural society, society, technology and communication, and knowledge) in function of the water resources of the continents, and estimation of water resources in terms of social variables; (3) mathematical functions for social parameters are shown to be scalable in space and time.

Methodology
Selected indicators. The renewable water per capita (RWPC) is chosen as the overall indicator of water resource status. The indicators corresponding to rural society, urban society, technology and communication, and knowledge are the ratio of rural to urban population (PRUP), population density (PD), number of internet users (IU), and education index (EI), respectively; each of them is defined below.
• Renewable water per capita (RWPC): Renewable water is the amount of water that a basin can replenish during the annual water cycle. Per capita renewable water is the available volume of renewable water per person every year measured in millions of cubic meters per person. indicator is effective in mass information related to water use. The unit of this parameter is the percentage of the internet-using population with respect to the total population. The term "Water Internet" is reminiscent of water use and internet connectivity. The Water Internet is a source of water supply information for involved organizations and citizens in general 22,34,39 . • Education index (EI): The educational level is a leading determinant of a person's knowledge about the use of water resources. The Education index (EI) is calculated as the average years of schooling received by a population of individuals 34 . Figure 1 displays the phases of this paper's methodology. This work proposes the social indicators PRUP, PD, IU, and EI to quantify the RWPC in Africa, America, Asia, Europe, and Oceania with soft computing methods (ANN, ANFIS-SC, and GEP). This paper analysis evaluated two types of functional patterns of associations: (1) per capita data on water resources were applied as input and values of social parameters were quantified as output; (2) social parameters were applied as input and per capita water resources parameter were quantified as output. The components of this paper's methodology are shown in Fig. 2.
The per capita renewable water data and social indicators were normalized for each country with Eq. (1). Normalized values range between 0 and 1.
where X N , Xi, Xmin, and Xmax denote the normalized value, the real value, minimal value, and the maximal value, respectively. Normalization operations are performed before the modeling process so that the algorithm fairly examines the various dimensions of the databased on the same standardized range. This work implements a normalization process with Eq. (1) for training and testing 41,42 .
This study selected countries for analysis that exhibit a decreasing trend of per capita renewable water in the 13-year study period (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). The studied countries' names classified by continent are listed in Table 1. Africa, America, Asia, Europe, and Oceania include 48,35,43,20, and 5 countries across numerous climate conditions, respectively. This study randomly chose 70% of each continent's countries for model training and  Soft-computing models. Soft computing methods are effective in detecting new and valuable information from large datasets with the purpose of discovery, classification, and forecasting 44 . The well-known ANNs, ANFIS, and GEP are described in the next section.
ANNs. Artificial neural networks (ANNs) are computing systems inspired by biological neural networks. The initial aim of neural networks was to solve complex problems mimicking the human mind. Over time ANNs' focus shifted to emulating specific mental abilities. An ANN is based on a set of connected units or nodes, called artificial neurons (similar to biological neurons in the animal brain). Any synapse between neurons can transmit a signal from one neuron to another. The receiving neuron can process the signals. In conventional ANN the synapse signal is a real number, and a nonlinear function of its inputs calculates each neuron's output. Neurons   www.nature.com/scientificreports/ and synapses apply weights that are adjusted as learning progresses. This weight increases or decreases the signal strength that it sends to the synapse. Neurons are commonly organized in layers. The signals travel from the first layer (input) to the last layer (output), and they may travel multiple times [45][46][47] . Three fundamental characteristics of ANNs for determining an optimal solution are: (1) the applied algorithm, (2) the activation functions, and (3) the neurons, as follows: 1. The applied algorithm and layer characteristics: The Levenberg-Marquardt (LM) algorithm with three-layer has been selected for use in this study due to its faster convergence in training networks. The error propagation algorithm changes the network weights and bias values so that the activation function decreases more rapidly. 2. Activation Functions: Selecting the activation function has a significant effect on the accuracy of the network output. There are three main activation functions for neural network modeling: the Logsig, Tansig, and Purelin functions. Several activation functions are used to develop the network to achieve the best combination of activation functions in a network with one to three hidden layers. The log-sig, tan-sig, and pure-line functions were applied in the hidden and output layers. Using one or a combination of these activation functions (between layers) may lead to an optimal model with the highest correlation value and the smallest error.  www.nature.com/scientificreports/ 3. The neuron number determination: Determining the number of hidden layers to create a network with the least error in predicting the desired outputs is essential. Trial-and-error is the best way to determine the optimal number of neurons in the hidden layer of ANN models [48][49][50] . The number of neurons in the lattice layer has a significant effect on the neural network's function. Using a small number of neurons prevents the neural network from learning most of the patterns accurately. On the other hand, the presence of a large number of neurons leads to the preservation of patterns and thus prevents the neural network from learning to recognize their basic features.
The authors wrote the ANN program code in MATLAB.

ANFIS.
A neural-fuzzy inference system is an artificial neural network based on the Takagi-Sugeno fuzzy system 51 , which is in accordance with the set of fuzzy rules if-then that learns to identify nonlinear fitting functions. ANFIS is a universal estimator 52 , which is herein applied. If-Then fuzzy rules are required to specify functions between the fuzzy variables of a fuzzy system. Equations (2) and (3) give a typical rule set with two fuzzy If-Then rules in a first order Sugeno system: where A1 (LOW), A2 (LOW), and B1 (HIGH), B2 (MEDIUM) denote the membership functions (MFs) for inputs x and y, respectively, the importance of the clusters' number of ANFIS-SC is determining the efficient radius amount based on the trial-and-error method. The radius varies from 0.20 to 0.60 42 . The ANFIS method with Subtractive Clustering (ANFIS-SC) is herein applied. Subtractive Clustering (SC) is an extension of the mountain clustering method proposed by Yager and Filev 53 , in which the data are clustered by evaluating the potential of data in the specification space 54 . Linear least squares (LLS) are applied to determine the MF's output, following previous works 42,50,55 These authors wrote the code for ANFIS-SC in MATLAB.

GEP.
Gene expression programming is a method of mathematical modeling based on evolutionary computation and inspired by natural evolution. This method was introduced by Ferreira 56 in 1999 and advanced in 2001. The GEP algorithm integrates the dominant view of the two predecessor inheritance algorithms to resolve their weaknesses. GEP features a chromosome genotype similar to a genetic algorithm (Genetic Algorithm), and the phenotype of a chromosome has a tree structure with length and size variable similar to the genetic programming algorithm 56   www.nature.com/scientificreports/ fitting functions are the mutation rate (MR), the inversion rate (IR), the IS transposition rate (ISTR), and the RIS transposition rate (RISTR), one-point recombination rate (OPRR), two-point recombination rate (TPRR), gene recombination rate (GRR), and gene transposition rate (GTR) whose values are listed in Table 3. The penalizing tool with parsimony pressure was applied in this study, which. implements several mathematical functions to predict the hydro-social indicators (+, −, * , /, ln x, e x , x 2 , ). This study first employs the terminal set for the renewable water per capita indicator with output sets containing PRUP, PD, IU, and EI, and then employs the terminal sets for the cited social indicator with output renewable water per capita indicator. This study employs the soft-computing software GeneXpro Tools 4.0.
Evaluating model performance. The goodness-of-fit correlation coefficient (R), root mean squared error (RMSE), and mean absolute error (MAE) were applied to evaluate the model's performance. The R, RMSE, and MAE are respectively defined as follows: where hydrosocio o and hydrosocio e denote the average observed and estimated hydro-social indicators' values respectively, hydrosocio io and hydrosocio ie denote the observed and estimated hydro-social indicators' values respectively, and N denotes the number of data.
The correlation coefficient (R) measures the degree of statistical association (positive or negative) between variables. The RMSE measures the goodness of fit, giving higher weight to high values of observations. by comparing the estimated values and the observed values. The MAE measures the distribution of goodness of fit at moderate values 57 . The models' performances are optimal if the R and RMSE are closer to 1 and 0, respectively. This study also employs various graphical methods to display models' results.

Results and discussion
Evaluating indicators. Among the selected parameters the ratio of rural to the urban population directly relates to the per capita renewable water, whereas the population density, internet users, and education index exhibit an inverse relation with the per capita renewable water worldwide. It means the per capita renewable water decreases with decreasing rural to urban population and increasing population density, internet users, and education index. The urban population has increased in developing regions, which feature increasing population density. People's health is threatened by poor urban sanitary infrastructure leading to disease and social decay. Increasing population density and a reduction in per capita renewable water inflict social harm and disrupt society's economic growth 58 . Population density also is positively related to the relative number of elderly and social vulnerability because potential casualties increase with population size 40 . On the other hand, with the increase of Internet users and education index, the per capita renewable water has increased. As long as the knowledge and www.nature.com/scientificreports/ awareness of communities improved, the consumption algorithm decreased, leading to a reduction of renewable water per capita. Therefore, the level of literacy and knowledge for a community can be the basis for making the right decisions in agriculture, health, natural resource management, and other activities related to water resources for decision-makers. The latter situation calls for better communication among water users through social media and improved education to learn and develop optimal water management.
Evaluating models and developing hydro-social equations. Three soft-computing approaches,  Table 4. The activation functions of the output nodes were linear for all the continents. The activation functions of the hidden nodes of the ANN-LM models for the P1 through P4 indicators were respectively the tangent sigmoid, tangent sigmoid, tangent sigmoid, and logarithm sigmoid for Africa; the activation functions of the proportion of rural to urban population was the tangent sigmoid for all the continents. Table 5 lists the results of the soft computing optimal models' estimates of the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI), denoted respectively by P1 through P4, during the test period in the world's continents.  These results indicate the climatic characteristics of the continents influence the performance of the models. The models' performances for Africa and Oceania associated with the type B dominant Koppen climate classification was the best. The models' performances for Asia and America that have similar climatic classification were nearly equal. The average model performance for Europe in the type D climate classification was the poorest among the continents. Figures 6, 7, 8, 9 and 10 show the observed and estimated social parameters obtained with the soft-computing models during the test period in Africa, America, Asia, Europe, and Oceania, respectively. Figure 11 compares the R, RMSE, and MAE values from the soft-computing models. The R values for soft-computing models are close to 1, with the quality relations being: RGEP > RANFIS-SC > RANN-LM for all social indicators. Figure 11 establishes that the ANFIS-SC model exceeded the ANN-LM models' performance. Also, the GEP models had better performance than the ANFIS-SC and ANN-LM for estimating the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI) parameters in Africa, America, Asia, Europe, and Oceania.
The main advantage of the GEP over other soft computing methods (e.g., ANFIS and ANN) is in producing predictive equations. The equations obtained with the optimal models for the social indicators (i.e., the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI) in Africa, America, Asia, Europe, and Oceania) are listed in Table 6. The equations that the GEP model discovers as a structure do not necessarily correspond to reality. The equations listed in Table 6 merely show the optimal equations extracted from the model after the evolution, for all indicators and in all basins (considering renewable water per capita as a decision variable). www.nature.com/scientificreports/ The performance of the GEP models in estimating the social indicators in three ranges of values, namely, 20% of the maximum estimated values (20%max), 60% of median estimated values (60%mid or 20%min to 20%max), and 20% of minimum estimated values (20%min), during the test period for the proportion of rural to urban population (PRUP), population density (PD), internet users (IU) and the education index (EI) parameters of Africa, America, Asia, Europe, and Oceania are listed in Table 7. Table 7's results indicate there is not a regular rule to determine the best-cited ranges performances. The education index and the population density have the lowest and highest R values among the other parameters in the three different ranges (20%max, 60%mid, and 20%min) in Africa, America, Asia, Europe, and Oceania. Therefore, the results indicate a strong pattern of association between the population density parameter and water resources status in all continents of the world. Figure 12 depicts the distribution of estimated data values of the social parameters (i = 1, 2, 3, 4) and their comparison through the continents. The box plots are a graphic display integrating multiple numerical relations. One approach to understanding the distribution or dispersion of data is through the box diagram, which is based on the "minimum," "first quartile-Q1(0.25%)", "median (0.50%)", "third quartile-Q3(0.75%)" and "maximum" Table 5. The results of soft computing optimal models corresponding to the testing period in the world's continents.

Indicators (P i ) Model
Africa America www.nature.com/scientificreports/ statistical indicators. Figure 12 shows Oceania and Africa exhibit the smallest and largest values of the rural to urban population, respectively. America has the lowest values of the first to the third quartile. The estimated population density value in Europe has the most values in the third quartile (0.75%). The median values of estimated internet users have the smallest and largest values in Africa and Europe, respectively. America has the lowest values of the first quartile, median, third quartile, and maximum values associated with the estimated education index values among the continents. The summary of hydro-social equations performance is listed in Table 8, where it is seen the best models' , performances are such that PD > PRUP > EI > IU, PD > IU > EI > PRUP, PD > IU > PRUP > EI, PD > PRUP > IU > E I and PD > EI > IU > PRUP for Africa, America, Asia, Europe, and Oceania, respectively. This paper's results indicate the pattern of association between social parameters and water resources is complex. Renewable water per capita was estimated using social indicators PRUP, PD, IU, and EI based on gene expression programming. The results of GEP to estimate RWPC corresponding to the testing period in the world's continents as listed in Table 9. The values of RMSE for optimal GEP models equaled 0.089, 0.058, 0.042, 0.049, and 0.036 for Africa, America, Asia, Europe, and Oceania, respectively. Figure 13 displays the observed and estimated RWPC parameter during the test period in the world's continents. The equations obtained with the optimal models for the renewable water per capita in Africa, America, Asia, Europe, and Oceania are listed in Table 10. The fitted equations can be applied at variable spatial and temporal scales. The derived equations imply that water resources in Africa and Oceania are governed by the PRUP, PD, IU, and EI indicators. Also, the PRUP, PD, and IU indicators in Europe and PD and IU indicators in America and Asia have the most influence on their water resources status. The association between social parameters and water resources in all continents is variable. The linking of these social indicators with the per capita renewable water is a function of the countries' cultural and economic conditions, thus bearing on the future management and policymaking across continents. This study's results concerning hydro-social indicators are consistent with the findings by Forouzani et al. 2   www.nature.com/scientificreports/ This paper's results establish the importance of examining the interactions between climate, the status of water resources, and social indicators. The state and social conditions of a country reflect the status of its water resources. Therefore, this study has shown how significant an impact the management and planning of a country can have on its water resources. Each successful water resources project rests on a successful social setting. www.nature.com/scientificreports/

Concluding remarks
One of the most critical issues in the water resources systems is its social context, which poses many challenges in systems analysis. Therefore, the impact of social indicators on water issues, and vice versa, is crucial. On the other hand, direct measurement of social indicators and their evaluation in water resources management is complicated, time-consuming, and expensive. Therefore, it would be beneficial to implement development www.nature.com/scientificreports/ meeting water-resources planning objectives. Therefore, such planning must be socially grounded for its success. This study assesses several social indicators, i.e., the proportion of rural to urban population (PRUP), population density (PD), internet users (IU), and education index (EI) worldwide, and concludes these indicators have a high correlation with the per capita renewable water. These social indicators must be considered in water policy decisions and planning for sustainable water management and planning. It is concluded the modeling the association of hydro-social indicators with the per capita renewable water using the soft-computing methods is viable  www.nature.com/scientificreports/ and insightful. The performance criteria of the GEP models performed better than those of the ANFIS-SC and ANN-LM models for the world's continents. This paper shows it is possible to estimate the water status and social indicators of a society based on hydro-social equations developed when there is a paucity of information about the social status. These estimates are useful for water resources management. This study has shown a successful application of hybrid soft computing to determine functional relations between socio-economic parameters and water and environmental resources parameters.

Recommendations
Some specific recommendations for improving future research are as follows: • Examining other quantitative and qualitative social indicators.
• Examining other indicators such as environmental, economic, cultural, and political indicators.
• Applying other models of soft computing methods for exploring hydro-social relationships.
• Applying soft computing methods in examining the interrelationships between social indicators and water resources indicators to specific issues of water management, such as flood and drought management. www.nature.com/scientificreports/   www.nature.com/scientificreports/ www.nature.com/scientificreports/

Data availability
All the required data have been presented in our article.