Source identification and potential health risks from elevated groundwater nitrate contamination in Sundarbans coastal aquifers, India

In recent years groundwater contamination through nitrate contamination has increased rapidly in the managementof water research. In our study, fourteen nitrate conditioning factors were used, and multi-collinearity analysis is done. Among all variables, pH is crucial and ranked one, with a value of 0.77, which controls the nitrate concentration in the coastal aquifer in South 24 Parganas. The second important factor is Cl−, the value of which is 0.71. Other factors like—As, F−, EC and Mg2+ ranked third, fourth and fifth position, and their value are 0.69, 0.69, 0.67 and 0.55, respectively. Due to contaminated water, people of this district are suffering from several diseases like kidney damage (around 60%), liver (about 40%), low pressure due to salinity, fever, and headache. The applied method is for other regions to determine the nitrate concentration predictions and for the justifiable alterationof some management strategies.


Hydrogeological setting
Total South 24 Parganas district is situated under the Gangetic delta; the southern portion of this district a large are is covered by the Sundarban Biosphere Reserve (SBR), and rivers flowing over this area like Matla, Thakuran, Raidighi, Bidya, Raimangal and Saptamukhi etc. Islands are situated in this district i.e., Sagar Island, Fraserganj, Lothian Island, Bulcherry, Halliday Island, Dalhousie Island, and Bangaduni Island at the mouth of the river Gosaba of these few Islands are submerged under seawater.The study area also includes the primary intertidal deltaic mass and the coast sand associated with estuaries and tidal streams; alluvial and marine silt of the Quaternary era make up the majority of the South 24 Parganas district's geological features in the Bengal basin 23 .Das et al. 24 state, that although delta formation is still ongoing, the northern portion of the South 24 Parganas is a component of the active delta zone; the restricted aquifer serves as the primary supply of drinking water in this area, and deeper aquifers have also been observed there.According to Datta and Kaul 25 , depending on their vertical position, aquifers can range in depth from 160 to 335 m, which are notable sources of drinking and irrigation water; tertiary silt and alluvium from the Pleistocene to the present comprise the majority of the aquifer strata in this region.This region significantly falls under the lower ganga basin area of the Holocene Sediments predominantly collected in lacustrine, marine, and fluvial settings 26 .The porous alluvial and coastal sediments in the area allowed undesirable pollutants to seep and infiltrate into the groundwater aquifer 27 ; at a depth of 160 to 400 m below the surface, the aquifer is composed primarily of freshwater layers, whereas the shallow aquifer, about 60 m below the surface, is dominated by salty water.In the study region, parent rock played a noteworthy role in salinity intrusion, hydro-geological interaction and cation exchange which significantly impact water quality 28 .

Methodology and data sources
Big data is required for conducting the research, a total of 58 samples have been collected throughout this district.By using Google Earth-pro software, we determined the tube well samples in this region.GPS was used for documentation and recording the data.Before gathering the water, disconnecting the standing water 10 to 15 min, groundwater was pumped.-density washed bottles were used to collect water (Jaydhar et al. 26 );After that, the samples were immediately transferred to the Burdwan University laboratory and stored at below 5 °C for laboratory analysis of the hydro-chemical properties of groundwater.Cations and anions were determined by ion chromatography using Dionex ICS-90.The inductive coupled plasma mass spectrometry method is used for analysing As (Islam et al. 3 .Quality control tools and critical procedures of the lab were used for quality assurance of groundwater.To conduct this research, Logistic Regression (LR) method is used and ArcGIS 10.2.4 is used for the thematic layer of different parameters like depth of water, the temperature of water, salinity, EC, pH, K + , Mg 2+ , Na + , As, F − , Cl − , HCO 3 −, PO 4 2− , SO 4 2− , and NO 3 −.The susceptibility map of NO 3 − and human health hazard map was prepared by ArcGIS 10.2.4 software.Piper diagram and USSL diagram is crafted for describe the water quality.The flow chart of the methodology is shown in Fig. 2.

Logistic regression
One important and commonly used modelss is Logistic Regression (LR); in several applications, various researchers cite the LR model on their research topic (Pradhan and Lee 29 ).In real situations, it is challenging to use; the severe assumption was defined by the LR model, which is measured the difficulty of the approaches in this study.Several statistical approaches based on the LR model can overwhelm this difficulty and formulate a straightforward approach which uses different analyses like bivariate such as frequency ratio 30 .Still LR method is much suitable than other methods, several drawbacks are present in this method.To solve this problem, multiple studies apply bivariate analysis of LR; despite some drawbacks, one advantage of the LR model is that it can calculate the discrete and continuous data separately or together.LR model was done by using the "Statistical Package for Social Science (SPSS) V 15 programme".By using the following equation, we calculate LR where, P represents the subsequent equation can calculate particular observational possibility possibilities and zwhere βo represents algorithm intercept, n and X1 represent conditioning factors, β1 represents independent variable contribution.

Health risk estimation (HRE)
The health risk of the people was estimated by adopting the subsequent equations introduced by (US EPA 31 ): (1)  where, 'CDId' indicates dermal of day-to-day dosage of chronic trace elements (μg/kg/day); 'SA' signifies exposure of skin area; 'Kp' represents permeability coefficient; 'ET' suggests time of contaminants exposure rate (h/day) and 'CF' means factors responsible for units of conversion (L/cm 3 ).
Hazard quotient (HQ) of every trace element was measured by applying the successive equation (Eq.5): RfD of every contaminant was obtained from regulations of (US EPA 31 ).Probable health risk of the people was estimated through the subsequent equation (Eq.6): where, HI is Health Risk Index.

Physical properties of groundwater in coastal aquifers
Each of the conditioning elements that have been chosen has unique physical and chemical characteristics that play a significant role in regulating the water quality of a given location.This is especially true in the complex coastal zone, where the quality of aquifers is equally influenced by both land and seawater.Generally speaking, the distributional pattern of several the conditioning factors chosen for this study varies during the investigation rather than remaining constant.The descriptive statistics state the distributional pattern of all adopted conditioning factors mentioned in Table 1.The conditioning factors, including EC, temperature, and pH varies 340.84-4773.8(Fig. 3a), 23.19 °C-28 °C (Fig. 3b), and 7.55-8.81(Fig.3c); accordingly, the highest concentration of EC was observed in Diamond Harbour I and II block along with this north-western and north-eastern part were experienced with higher temperature; salinity and groundwater depth ranges from 0.20-1.61mg/l (Fig. 3e) to 0.06-33.39m (Fig. 3n).Another critical component like F − , average value is 0.79 and ranges from 3.76 to 0.002 mg/l (Fig. 3d www.nature.com/scientificreports/ to 1.09 mg/l (Fig. 3f), 737.71 to 15.38 mg/l (Fig. 3g) and 40.95 to 1.03 mg/l (Fig. 3h) respectively.As, PO 4 2− , and SO 4 2− are very distinctive hydro-chemical properties of groundwater, average values are 0.204 mg/l, 2.29 mg/l and 31.87 mg/l (Table 1); values range from 0.37 to 0.11 mg/l (Fig. 3m), 4.60 to 0.62 mg/l (Fig. 3l) and 184.76 to 0.002 mg/l (Fig. 3k) accordingly.In addition to this, Fig. 3i and j represent spatial distribution of CI and HCO 3. The distributional pattern is very uneven throughout the entire study region; the highest proportion of salinity was observed in the middle part and northern part of this study area, whereas the concentration of Mg 2+ is high in the western part of this district, which also another important causative factor; Na + is high near Diamond Harbour II, and K+ mostly found in north and north and north-eastern part of this study region.

Correlation among hydro-chemical parameters
All groundwater samples were characterised with distinctive hydro-chemical compositions.Using Pearson's correlation matrix analysis in SPSS software, these physicochemical characteristics were mentioned in Fig. 4. The validity of the results is demonstrated by the statistical analysis, which also included descriptive statistics and Pearson's correlation, which logically supported the decision to use of parameters.After analysing all groundwater samples, several conditioning factors are considered, including As, PO 4 2− , SO 4 2− , HCO 3 −, Cl − , K + , Na + , Mg 2+ , F − , pH, EC, depth, temperature, and salinity.Our research shows that some causative factors have a highly positive and negative correlation to each other.Figure 4 states NO 3 − and K + have significant interdependence (0.702) to each other; Cl − strongly correlated with Na + (0.821), EC (0.947) and Mg 2+ (0.664), whereas Na + have distinctive interdependence with HCO 3 − (0.982) and EC (0.833).Apart from these, all parameters have interdependence with each other but are very negligible.This result helps us to understand the interdependence among all adopted conditioning factors; it works very beneficial in determining the appropriate causative factors in current research work.

Multi-collinearity assessment of variables
We used multi co linear analysiWe used multi co linear analysis to study the linear relationship among variables to check the linear relationship among variables.We used fourteen hydro-chemical properties for analysis.The variance Inflation Factor (VIF) and Tolerance of the sample are shown in the Table 2. VIF and tolerance are highly negatively correlated with each other.If the VIF value increases, then the Tolerance value also decreases.In case of EC, Cl − and As, the Tolerance values are 0.056, 0.041 and 0.021, which is below the threshold value.In the case of Na + the highest VIF value is 8.75.In our study, the VIF value extends within 10, so we can say that there is no multi-collinearity problem among all variables.

Population pressure related stress on water quality
In many countries, coastal tourism is increasing rapidly, so it negatively impacts coastal region's water, air and othernegatively affects coastal regions water, air and other environments 32 .In our study, we assessed the effect of population pressure on water quality.The population density of this district varies from one block to another.The average population density of this district is 819 sq/km., which is 214% more than the Indian population density.We classified five zones of stress on water quality like-very high, high, moderate, low, and very low.The North-western and northern part of this district is very high population pressure; southern islands of this district like Sagar Island, southern part of Namkhana etc., are less stress; the South-eastern and some north-eastern parts represent moderate stress, which is shown on (Fig. 5).Due to the density of this region, People suffer by pure drinking water scarcity.They depend only on shallow and deep tube wells for their daily potable water, and pond water is used for other activities like baths, toilet, etc. which is comparatively arsenic and fluoride contaminated.Due to this, contaminated water is the main source of drinking, so residents of this region suffer from several diseases like diarrhoea, kidney damage, and several diseases.

Groundwater vulnerability and health risk analysis
In South 24 Parganas district, various patterns of health risk were observed.Some blocks represent high health risk, a few blocks representsrepresent high health risk, a few blocks represent high health risk, a few represent www.nature.com/scientificreports/ a high health risk, a few represent high health risk, and a few characterise low health risk.In this study area, five classes have been carried out like, very high, high, moderate, low and very low risk zones based on local conditions; because every location has distinctive locational settings, shown in Fig. 6.The derived result about groundwater vulnerability and corresponding health risk is fully controlled by regional geohydrological conditions as well as several environmental factors, including closeness to the ocean, geological settings, and aquifer depth, which significantly control this region's groundwater status.Maheshtola, Diamond Harbour II, Falta, Budge Budge, Western Bhangar very high health risks, and the north part of Kulpi represent very high health risks (Fig. 6); the southern part of Kulpi, some part of Patharpratima, Jaynagar I, II and north-eastern Canning II represents high human health hazard.A moderate human health hazard is observed in Baruipur, Magarhat II, major part of Gosaba and few part of Kakdwip.Major parts of this district like Sagar Island, the southern part of Namkhana, some parts of Basanti, and the southern portion of this study area fall under low human health hazard (Fig. 6).Result of Hazard quotient (HQ) for adult and children among four selected parameters is presented in Supplementary Table 1.

Hydro-chemical properties
The Piper diagram can easily interpret the Chemistry of the water sample; sources of groundwater contamination can easily be predicted using the Piper diagram.The Piper diagram (Fig. 7) shows that the maximum samples fall under the alkaline type (Na++K+), which contains pH 8.5.Its characteristics are poor soil structure and low infiltration capacity.Sodium chloride and mixed types of samples are found in this study area.From the diagram (Fig. 7) we can predict that most wells have strong acids surpassing weak ones.Agriculture surface runoff is the main HCO 3 source 33 ; high exposure of Na + increased in groundwater due to cation exchange capacity in clay.In groundwater, the highest concentration of alkaline organisms make water unfit for consumption.

Model evaluation
Appropriate validation procedures are essential to any scientific investigation; without them, the results obtained have no practical value.In this current research, six notable statistical validation methods have been employed, including specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), F score and receiver operating characteristics curve (ROC)-area under curve (AUC) in validating the derived prediction measures with ground level; samples are used in two such as training and validating section.In these validation techniques, four distinctive parameters are applied, including true positive (TP), true negative (TN), false negative (FN), and false positive (FP) to estimate the validity of the result.These values from the validation procedure determine how accurate the adopted model are; greater values indicate better results from the model, and vice versa 34 .The validation results are shown in Table 3; among all the validating techniques AUC-ROC gives higher values 0. Therefore, the results state about the model accuracy; the adopted LR model is very much acceptable in this region according to geographical conditions; Fig. 8 shows the graphical representation of the performance of all adopted validating techniques.

Relative importance of causative factors
Mean Decrease Accuracy Method (MDA) is applied in this research work, and it is beneficial for ranking and choosing the factors of fourteen parameters related to nitrate concentration in groundwater.The very important

Chemical analysis of coastal groundwater
The USSL diagram is a plot between salinity hazard on the X axis and sodium hazard (SAR) on the Y axis which is proposed by "United State Salinity Laboratory (USSL)" for the classification of water which used for irrigation.This diagram (Fig. 9) classified water into 16 classes.For determine the salinity and sodium hazard 42 samples are selected.C3S1 represents medium salinity and low alkalinity which occupied 34.2%.C2S1 represents 17.02% total area, indicating moderate salinity and low alkalinity.C3S2 classes indicate high salinity and moderate alkalinity, representing 32% of tube wells32% of tube wells, and 32% of tube wells.Other important classes are C3S4 which indicates very high alkalinity and high salinity, which covered 12.76% total tube well.Only 2.12% tube well samples were covered by C4S4 represents very high alkalinity and salinity.
Wilcox diagram is an essential diagram for analysis the quality of groundwater.This diagram is categorized into five classes :-i.excellent to good ii.Good to permissible iii.Permissible to doubtful iv.Doubtful to unsuitable category and v. unsuitable category.FallThe highest percentage of data falls: The highest percentage falls under the acceptable to doubtful category (59.23%), then the doubtful to unsuitable category (27.27) and good to permissible category holds 9.09%; very few percentages occupied by excellent and unsuitable category (2.27%).It can be concluded that the highest number of samples are doubtful condition, so agriculture practices are threatened.

Discussion
Identifying the hydro-chemical properties, eIdentifying the hydro-chemical properties, especially nitrate contamination, and its mitigation strategy in the coastal district in South 24 Pargana is an important work.In our research study, we identified the nitrate susceptibility map among all districts, and it depictss where the high, medium and low nitrate susceptibility occurred using the LR model.Different anthropogenic ies like industrial activity, agricultural activity, sewage etc. are highly correlated with groundwater nitrate concentration.Several researchers have shown that nitrate concentration is directly associated with different land-use patterns 35,36 .According to Kumazawa 37 in agricultural activities use of nitrogen fertilizer create a great negative impact.Groundwater pollution and nitrate concentration are highly correlated with each other 38 .
Various studies still describe the hydro-chemical properties of groundwater and nitrate concentration susceptibility in the coastal district using other methodsstill describe the hydro-chemical properties of groundwater and nitrate concentration susceptibility in the coastal district using different methods and models like, LR.In our study a large proportion of area falls under the very high nitrate susceptibility zone.The total area is divided into five susceptibility zone including very high, high, moderate, low, and very low 39 use RF and Genetic Algorithm (GA) for assessment of groundwater vulnerability.(Pal et al. 28 ) used the RF and MDA method for determining the concentration of nitrate susceptibility prediction approach in coastal district.In our research study, we used fourteen nitrate conditioning factors.By using multi-collinearity analysis, we ranked them using MDA method.Among all variables, pH was essential and ranked one, value is 0.77 which is highly controlled the nitrate concentration in the coastal aquifer in The nitrate concentration in South 24 Parganas district is very high, so different diseases like blue baby syndrome, fluorosis, diarrhoea and skin cancer are common in this area 40 .Many researchers have done research work about the coastal regions groundwater quality by using different methods like machine learning and GISbased method 3,41,42 .To determine the health risk due to nitrate contamination we used acceptable field-based methods and techniques.(Pal et al. 28 ) uses the same technique for assessing the nitrate susceptibility prediction approach in Indian coastal aquifers.

Conclusions
Different parameters are used for determining the concentration of nitrate in coastal multi aquifers like-pH, Cl − , As, F − , EC, Mg 2+ , NO 3 −, K + , Temp., SO 4 2− , PO 4 2− , Na + , Salinity, Depth and HCO 3 −.Fifty-eight samples were used in this work; the highest relative important factor is pH (0.77) then Cl − (0.71) and other variables like depth, temperature and HCO 3 − are less important than other factors.Concentration of nitrate in groundwater comes from several sources like, anthropogenic activities, agricultural activity, and sewage water etc. and its effects in coastal aquifer.In this research work we used data mining techniques like SPSS, Diagramme software, ArcGIS etc. to determine the nitrate concentration in coastal district, South 24 Parganas.The LR model is used to determine the nitrate concentration of this study area.In our study the values of specificity, sensitivity, AUC and F score of the training stage is greater (0.911, 0.915, 0.92 and 0.928) than validation stage.While validation stages the sensitivity, specificity, F score and AUC values are 0.885, 0.882, 0.89 and 0.892, which shows that the model is significantly applicable.In this region, some portions face nitrate concentration more than the rest of the portions.North-western, mid western and some part of northern portion is facing high nitrate concentrations.To determine the water quality and agricultural suitability for crop production, we used Piper's diagram and USSL diagram.Different unscientific activities like, industry, agricultural practices and use of high chemical fertilizers also lead to high nitrate concentrations in this region.Another main problem in this region is saltwater intrusion in the agricultural field due to different naturally occurrings, cyclones, and floods.People of this region suffer by pure drinking water scarcity.They depend only on shallow and deep tube well for their daily potable water, which is comparatively arsenic and fluoride-contaminated.Due to this contaminated water is the primary source of drinking so residents of this region suffer by several diseases like diarrhoea, kidney damage, and several diseases.In this current research, we have several limitations.Firstly, we do not consider geology, soil type, land use, land cover pattern, and other hydrogeochemical parameters that may be responsible for nitrate concentration in an area.Still, here we have considered several nitrate conditioning factors that are incredibly accountable and mostly come from the abovementioned parameters.Secondly, only one model, LR, is used to determine the nitrate concentration of this coastal district.So, in the future, more advanced and scientific methods is applicable for predicting nitrate susceptibility.However, LR gives noteworthy ground truth prediction, which is quite similar to the actual condition of this region that also comes up in the result of all employed validating techniques.Therefore, this study is very similar to a ground scenario and accurately describes the existing alarming condition; thus, policymakers and stakeholder can take appropriate steps to reduce this lousy effect and create a healthy environment for the local people of this region.

Figure 1 .
Figure 1.Location map of the study area (this map was generated using ArcGIS, version: 10.3.1, www.esri.com/ arcgis).
South 24 Parganas followed by Cl − , value is 0.71.Other factors like-As, F − , EC and Mg 2+ ranked third, fourth and fifth position and their value is 0.69, 0.69, 0.67 and 0.55, respectively.Other factors like depth, temperature, HCO 3 − are fewer effective factors for nitrate concentration in groundwater in this study, and their values are 0.32, 0.25 and 0.23 respectively.In our study the values of specificity, sensitivity, AUC and F score of training stage is greater (0.911, 0.915, 0.92 and 0.928) than validation stage.While, validation stages the values of sensitivity, specificity, F score and AUC are 0.885, 0.882, 0.89 and 0.892, which shows that the model is significantly applicable.

Table 1 .
Descriptive statistics of selected parameters.

Table 2 .
Multi-collinearity values for several explanatory factors.

Table 3 .
Values of model evaluation.