Introduction

The Holocene quaternary alluviums of Mid-Gangetic Plain (MGP) supports around 60 million population of India. The region while apparently fertile for agriculture is enriched with high arsenic deposits, which gets into the groundwater system through over-pumping and dissolution of iron-oxy-hydroxides (Fe-OOH) via fluctuating redox conditions1,2,3. Arsenic release is suspected to be microbial degradation of sedimentary organic deposits under the presence of O2 and NO3− 4,5,6. The process further accelerates with suitable mineralogical and geological conditions7,8,9. On the other hand, human-induced anthropogenic activities like overutilization of groundwater disrupt the equilibrium between the oxic and anoxic zones causing the increased formation of metal oxy-hydroxides. Owing to the non-desiring ubiquitous nature of arsenic, the uptake mechanism by the animals and plants is purely adventitious in the biotic environment.

Both the toxic pentavalent As (V) and trivalent As (III) forms intrude into the cell cytosol by mimicking the properties of nutrients and essential substrates like phosphates, glucose and glycerol10. While trivalent aromatic arsenic forms, carry the highest toxicity but the stability deteriorates as a function of temperature and sample matrix when precursors like methylated forms take the dominance. Significant risk exists when anoxic conditions vis-à-vis in the groundwater interaction interfaces persist11. In neutral solutions, arsenic exists as arsenite (As(OH)3), more chemically similar to glycerol12, causing the un-intentional conduction through the AQPs bi-directional channels13. The exposed population of MGP has exhibited significant amount of carcinogenicity, ranging from skin ailments like keratosis and hyperpigmentation to neurological disorders like brain diseases14,15. Further evidence shows that prolonged and short-term exposure quantified through arsenic aggregation in hair, nail and urine carries the risk of developing cancer shortly in the near future16,17. The situation is grimmer for the MGP and the other fluvio-deltaic regions of the subcontinent as there is an average background exposure of 15.9 and 10.3 mg kg−1 in the onshore and offshore water samples respectively, with the finer sediments having higher concentration at increasing depth18. Studies since 2000 are continuously indicating elevated arsenic concentration exceeding WHO limits of 10 µg L−1 19,20.

While the hydrogeochemical investigation of groundwater can reveal much about the underlying mechanism of arsenic release, an alternate method involves using soil and its secondary characteristics as predictors for evaluating the arsenic vulnerability21. Calcisols have been believed to affect the concentration of geogenic anions in the groundwater. The release is usually triggered under the increased presence of sodium bicarbonate through reduction under alkaline conditions. Low concentration of Ca2+, acting as a sink during silicate weathering, generally complements the release process21. Physical processes like precipitation and land surface temperature also quantitatively impact in changing the arsenic vulnerability through changes in the evapotranspiration progressing through the aquifer22,23.

The high presence of arsenic in the groundwater of MGP is further complemented due to (i) spatial geomorphological variation, (ii) climatic differentiation, and (iii) land use and landcover patterns. A landcover of agricultural crop field accompanied with changes in the irrigation pattern impels the spatial differentiation in the natural recharge. An excess recharge enriched with anthropogenic nitrate added via chemical fertilizers can change the weathering patterns causing contamination of arsenic2,24 and fluoride20. Excessive abstraction of groundwater under these circumstances not only degrades the quantity but can also lead to aggravation of geogenic pollution as can be seen from the Bangladeshi arsenic epidemic of the late twentieth century1,18,25,26,27. Co-occurrences of arsenic and fluoride along with contaminants of emerging concern have potentially hazardous consequences for different life forms. Thus, a proper understanding of the groundwater hydro-geochemistry is very imperative with implications on influencing the decision and policy-making process affecting groundwater conservation28.

In this regard, deriving a meaningful interpretation from the multi-parameter dataset is a challenge. Multivariate analysis (MVA), like factor analysis (FA), principal component analysis (PCA) and hierarchical cluster analysis (HCA), have been commonly used in such hydrogeochemical investigations29,30,31,32. While reducing the dimensions of datasets, the discreet knowledge about the data may get lost, but the essence of the data or the broad meaning conveyed by the data gets preserved in the form of data clusters. In light of the stated facts, the current study was undertaken (i) to understand the impact of seasonal change i.e. monsoon on the hydro-geochemistry of Darbhanga district through integrated application of hydrogeochemical, bivariate and multivariate statistical analysis (MVA), (ii) to propose a probabilistic model-based seasonal mapping of arsenic occurrence and exceedance, and (iii) to put forward the conceptual model to explain the observed arsenic spatial and seasonal variation. Overall, this study puts an effort to evaluate whether natural recharge or anthropogenic influence is more critical for arsenic mobilization in the Alluvium plains of MGP.

Results and discussion

Prevalence of ions, water facies and hydro-geochemistry

Major ions chemistry of the area indicates a groundwater state that is mostly within the acceptable drinking water range when compared to the World Health Organisation33 and Bureau of Indian Standards (BIS)34 standards (Supplementary Table 1). However, the dominating influence of monsoon precipitation on groundwater quality, visible through drastic changes in the ORP and pH, is likely to enhance arsenic pollution vulnerability in the groundwater. There was a noticeable increase in EC and TDS from pre-monsoon to post-monsoon, implying active weathering and dissolution through the infiltrating rainwater (Supplementary Table 1). This is supported by the enhanced levels of Na+, K+, Ca2+, Mg2+, SO42−, HCO3 and Cl in the post-monsoon (Fig. 1). The involvement of weathering and dissolution is further revealed through the Gibbs plots16,19 (Supplementary Fig. 1). All the groundwater samples were found to fall within the rock−water interaction or the weathering-dominated region of the plot. On closer observation, we noticed a decrease in (Na+ + K+):(Na+ + K+ + Ca2+) and (Cl):(Cl + HCO3) ratios towards the post-monsoon season, which was indicative of elevation in carbonate weathering due to precipitation. Piper diagrams (Fig. 2) further substantiate the fact by indicating that the dominant water types during pre-monsoon season were dominantly of mixed Ca-Na-HCO3 type which changes towards Ca-HCO3 type in the post-monsoon owing to significant recharge.

Fig. 1: Map illustrating the study area of Darbhanga and the spatial distribution of various water quality parameters in mg L−1.
figure 1

The concentration of anions such as SO42– (a, b), NO3 (c, d) and HCO3 (e, f) has increased significantly from pre to post monsoon. Alkali earth metals such as Ca2+ (g, h), Mg2+ (i, j) and alkali metals like Na+ (k, l) have similar enrichment trends after monsoon. Change in parameters like pH (m, n), TDS (o, p) and Cl (q, r) signify the activeness of various anthopogenic activities.

Fig. 2: Piper plot showing the evolution of groundwater.
figure 2

The dominance of carbonate weathering during post monsoon succeeds the activeness of silicate weathering during pre monsoon. Seasonality has propeled a major shift in the groundwater evolution of the region with reverse ion exchange reaction becoming prominent during post monsoon along with water type changing to calcium bi-carbonate.

Spatial hydrogeochemical profiling

The spatial distribution of SO42− shows that the concentration in the entire study area is way below the permissible drinking standards of BIS 1050034 (<200 mg L−1) and WHO 201133 (<250 mg L−1) (Fig. 1). However, the concentration of SO42− has increased during the post-monsoon as compared to pre-monsoon. A significant change takes place in the north of Darbhanga district where there is an almost 100% increase in the concentration values. The spatial distribution of NO3 shows that the part of the problem of introduction of NO3 in Darbhanga could be the increased usage of fertilizers during the pre-monsoon Rabi crops as compared to less fertilizer application during the post-monsoon Kharif crops. Consistent concentration of NO3 in the north-western region of the district signifies that Darbhanga city has little contribution in changing the dynamics of NO3 across different seasons, and hence agricultural fields play an essential role in its cyclicity. Within the study area, it is evident that during the post-monsoon, the northern region of the district experiences a more substantial carbonate weathering as compared to the southern part of the district. The change in the spatial distribution of TDS during the pre- and post-monsoon shows that the recharge flux does have the ability to change the overall drinking suitability of groundwater (Fig. 1). Decrease in TDS in the south-eastern region shows that agricultural fields are more susceptible to infiltration of rainwater and hence more dilution will occur leading to decreasing levels of TDS in the groundwater. The northern region has more presence of built-up regions leading to less dilution during the post-monsoon period.

Seasonality in the hydro-geochemistry

The seasonal aspect of the hydro-geochemistry was evaluated by utilizing bivariate methods like scatter plots and correlation analysis. The plots of total cations (Tz+) versus (Na+ + K+) (Fig. 3a–c) and (Ca2+ + Mg2+) (Fig. 3d–f) confirms the higher involvement of silicate weathering and ion exchange during the pre-monsoon season and an elevation in carbonate weathering towards the post-monsoon. During the pre-monsoon, majority of the samples plot closer to the 2:1 line (Fig. 3a), while towards the post-monsoon the spread decreases along the 2:1 line (Fig. 3b). This was also confirmed through the strong correlation of Na+ with both EC and TDS (Supplementary Table 2). Negative correlation of K+ with both TDS and EC suggests that the silicate species involved in this process appears to be sodium feldspar or albite7,8. What we also observed was the strong correlation of Na+ with HCO3, suggestive of ion exchange with Ca-Mg-HCO3 type water during the drier pre-monsoon i.e. lower water table leading to the gradual shift to mixed Ca-Na-HCO3 water type7. During the post-monsoon, the elevation in carbonate weathering is also supported by the strong correlations of Ca2+ and Mg2+ with EC and TDS, indicating the involvement of dolomite weathering22. An increasing Na+:Cl ratio with EC also points towards the excess of Na+ due to silicate weathering, which appears to be dominating during the pre-monsoon season (Fig. 3g–i). An immediate decrease in the Na+:Cl ratio towards the post-monsoon season (Fig. 3i) was a case of halite dissolution promoted by precipitation. Na+ vs. Cl (Fig. 3j–l) exhibit an excess of Na+ compared to Cl implying silicate weathering while samples close to the 1:1 line are indicative of halite dissolution.

Fig. 3: Scatter diagrams tracing the hydro-chemistry of the groundwater.
figure 3

Possible change in weathering from silicate to carbonate has been detected (af). Mineral mediated enrichment of alkali metals has weakened with monsoon (gl). The role of background anthropogenic interferences has been found to be relatively less (mr).

On the other hand, anthropogenic influences were evaluated by plotting K+ vs. NO3 (Fig. 3m–o) and Cl vs. NO3 (Fig. 3p–r). Levels of both K+ and NO3 are quite low in the groundwater; moreover, for most samples, there is no visible relationship between the two. We have already stated that the background levels of NO3 is very low; however, the reasons behind the low concentrations of K+ are still not clear. A few of the samples that fall along the uniline could be released due to agricultural usage of potassium and NO3-based fertilizers8. During the pre-monsoon season, Cl levels in the groundwater remain lower than the post-monsoon. Also, in the post-monsoon, there was not only a spike in Cl level but also a weak relationship with NO3 being observed. Given the fact that Cl also displayed positive correlations with Ca2+, Mg2+ and SO42− during the post-monsoon, it is likely that it could be an anthropogenic occurrence along with incidences of reverse ion exchange. Use of chemicals like bleaching powder35, wet deposition from the various industrial estates could be some of the plausible sources of Cl in groundwater during this period. The molar ratio of Ca2+:Mg2+ which remains closer to 2 for several samples during the pre-monsoon (Supplementary Fig. 2) again reaffirms the dominance of silicate weathering during the pre-monsoon season17. During monsoon, there was a significant shift in this ratio. However, for a wide range of samples, this ratio remained between 1 and 2, denoting a mix of calcite and dolomite weathering. Based on scientific literature we can attribute this to the dominance of carbonate weathering8, which we have already discussed earlier. A ratio close to 1 indicates dolomite dissolution, while a ratio close to 2 shows calcite weathering. Higher rates of carbonate weathering are also supported by the (Ca2+ + Mg2+) vs. (SO42− + HCO3) plot (Supplementary Fig. 2) in which most of the groundwater samples plot below 1:1 line5,23. The relationship of Cl with the alkaline earth elements Ca2+ and Mg2+ is not significant during the pre-monsoon season; however, as observed earlier in the correlation matrix, this relationship becomes much more prominent during the post-monsoon. We suspect the role of various anthropogenic factors behind enhancing Cl in groundwater. However, a more critical analysis points towards reverse ion exchange reaction1. The related discussion has been further elaborated in Supplementary Fig. 2 [(Ca2+ + Mg2+)/HCO3 vs. Cl]35.

The subdued nature of carbonate weathering during the pre-monsoon season can also be observed based on the correlations of HCO3 with Ca2+ and Mg2+ which remained low during the pre-monsoon (r = 0.37 and 0.47 respectively, Supplementary Table 2) and becomes stronger during post-monsoon (r = 0.71 and 0.66 respectively). The occurring exchange reactions could be traced from the relationship of Cl with Ca2+ and Mg2+ which has been analysed in Supplementary Fig. 2 [(Ca2+ + Mg2+)/HCO3 vs. Cl]. The increase in (Ca2+ + Mg2+):HCO3 ratio with salinity denotes an increase in the alkaline earth metals due to the process of reverse ion exchange. Speculations concerning HCO3 depletion must be ruled out, as the conditions in the groundwater were alkaline during both pre-monsoon and post-monsoon seasons. The occurrence of arsenic in the oxic groundwater conditions could be one of the causes behind its low exposure in the region (Fig. 4). Drawing from the empirical relationship between pH and ORP, it could be seen that during pre-monsoon the As (III) and As (V) have a high probability of occurring as H3AsO30 and HAsO42− respectively. The occurrence of H3AsO30 usually mediates in acidic water with a very high sensitivity to HCO3 concentration. High monsoon recharge leads to severe redox changes in the groundwater, causing species to be distributed among various oxidation forms. As (V)-dominated Alkali-Oxic (AO) water could be seen triggered due to sodic conditions under desorption mobilization mechanism.

Fig. 4: Dominant parameters impacting the occurrence of arsenic.
figure 4

Scatter diagrams between arsenic and depth (ac), pH (df), ORP (gi) and HCO3 (jl).

Scenario analyses through PHREEQC-based saturation index estimation

The stability of the groundwater is dictated by the thermodynamic equilibrium of the various minerals present in it. The estimation of the saturation index (SI) becomes important in order to quantify the dissolution/precipitation potential of the minerals besides helping in understanding the reaction pathways22,27. SI estimation is a sequential process requiring the estimation of molality (m), valency (Z), activity coefficient, and ionic strength (I). SI is defined as the ratio of Kiap (ion activity product) and Ksp (solubility product). SI > 1 (Kiap > Ksp) denotes supersaturation, <1 (Kiap < Ksp) denotes undersaturation, and Kiap = Ksp denotes equilibrium condition.

For our study, the seasonal SI mean of most minerals with exception of aragonite, calcite, dolomite, hydroxyapatite remains below zero denoting a state of undersaturation causing ion release (Fig. 5). In case of aragonite, it could be seen that as post-monsoon comes (SI > 0), there is precipitation of Ca2+ and CO32− to form the mineral, possibly providing a sinking effect to the Ca2+ concentration. However, near equilibrium conditions (SI ~ 0) dictate that this effect could be marginal. Further, weak undersaturation of aragonite, calcite and dolomite during only monsoon as opposed to the other two seasons provides a clear indication that the rainfall recharge is the critical factor controlling the ion release. Over-saturation of hydroxyapatite (Ca10(PO4)6(OH2)) explains that despite significant availability of Ca2+ ions, its sequestration through precipitation occurs through entire monsoon. The strongest correlation between Ca2+ and HCO3 (r = 0.86; Supplementary Table 2) and between Ca2+ and SO42− (r = 0.80) is observed during the monsoon season, which provides more basis to our understanding that availability of Ca2+ mostly occurs through four minerals namely aragonite, anhydrite, calcite and dolomite. Similar undersaturation trends are observed in arsenolite (As4O6) and claudetite (As2O3) signifying that oxidation of arsenic sulfides is still active. Seasonal consistency in the correlation between arsenic and SO42− (r = −0.32, −0.26 and −0.32 during pre-monsoon, monsoon and post-monsoon respectively) and relatively stronger correlation between arsenic and Ca2+ during pre-monsoon (r = 0.53) point towards minerals like gypsum and anhydrite which dissociate under the present alkali-oxic (AO) conditions. Although it is important to emphasize that anions like SO42− are naturally available in the atmosphere but an exceptionally high correlation between TDS and SO42− during only monsoon season shows that the contribution of rainfall (rich in sulfate) in driving this mechanism is highest.

Fig. 5: Saturation index of various minerals obtained through PHREEQC modelling and their variations during the different seasons.
figure 5

Undersaturation of arsenolite (SI < 0) shows high vulnerability for the future.

A critical analysis of the various sodium minerals shows that during pre-monsoon and monsoon, natron (Na2CO3.10H2O) could be one of the sources for providing Na+ and HCO3 in the groundwater (SI ~ −10; r = 0.82 during pre-monsoon; r = 0.83 during monsoon). This is also supported by strong correlation between HCO3, Na+ and TDS signifying that the ions in the groundwater must be a direct manifestation of natron dissolution. The cumulative weakening of this relationship during the post-monsoon and increase in correlation between PO43− and Na+ (r = 0.46) shows that agrochemical leaching has disturbed the existing geogenic balance. It is worth noting that ORP, which is a dominant arsenic controlling factor, shows highest correlation with NO3 (r = −0.49) and PO43− (r = 0.38) during post-monsoon.

Principal component analyses

PCA was done on the seasonal data to decompose it into five principal components (PCs) with a total of 86.1% variance explaining capability out of which the first three PCs can explain 74.3% variance (Supplementary Table 3). Parallel comparison revealed several differences in the governing hydro-chemistry of the pre-monsoon and post-monsoon, which were also analysed earlier in the bivariate and correlation analyses. PC1 of the pre-monsoon season has the highest variance of 37% with strong positive loadings imparted by EC, TDS, Na+ and HCO3 (Fig. 6). This PC is governed by silicate weathering, showing that ion exchange and sodium enrichment are predominantly mediated via silicate weathering during pre-monsoon36. This appears to be a cyclic phenomenon altering between the pre-monsoon and the post-monsoon as reverse ion exchange becomes the norm during the post-monsoon season. Two previous studies have also revealed that the soil of the Indo-Gangetic Plains is sodic6,9,14,26. Presence of sodium carbonate was found to immobilize Ca2+ resulting in a high sodium adsorption ratio (SAR) and ultimately, a high exchangeable sodium percentage (ESP). PC2 is represented by positive loading from Mg2+ and negative loading from pH with 20% variance. The exact hydro-chemistry of this PC was not clear; therefore, it was studied further by the application of hierarchical cluster analysis.

Fig. 6: Three-dimensional visualization of PCA loadings.
figure 6

There exists a characteristic difference in the processes active during a monsoon, b pre-monsoon and c post-monsoon season.

Although the level of NO3 in the groundwater was very low, however, it imparted a strong loading on PC3 during the pre-monsoon season. The lone representation of NO3 was considered to be anomalous and was studied further with the application of HCA. PC4 with 9.3% variance is represented by high positive loading from PO43− and negative loading due to ORP representing anthropogenic factors primarily of the agricultural origin, as seen from its positive correlation with Ca2+ (r = 0.5, Supplementary Table 2) along with a low positive loading of 0.49 in PC4. The causative agrochemical appears to be calcium phosphate besides geogenic contributions from minerals such as hydroxyapatite (Ca10(PO4)6(OH2)). PC5 contributes to 7.9% variance and has significant positive loading only due to Cl. Since the contributing mechanism is not clear, therefore, it has been studied further with HCA (Supplementary Fig. 3). PC1 with 41.4% variance for post-monsoon has a much more dominant representation of carbonate weathering as evident from the positive loadings of Ca2+, Mg2+, EC and TDS. Both calcite and dolomite weathering were found to be equally prevalent as revealed through the scatter plots and correlation. Cl was also found to impart positive loading on PC1 during the post-monsoon season which again reaffirms the occurrence of reverse ion exchange reaction with changing seasons. An influx of fresh rainwater with higher DO could lead to an elevation in ORP, while at the same time the low levels of NO3 may be further diminished due to dilution in the post-monsoon. This explains the reason behind the positive loading of ORP and negative loading of NO3 in PC2 during the post-monsoon season. The last PC of post-monsoon with 12.8% variance is represented by Na+ and pH reflecting the sodic nature of the soils in the Gangetic plains.

Hierarchical cluster analyses

HCA was applied to gain a better understanding and validate the findings of the bivariate and PCA analyses (Supplementary Fig. 3). The Dendrogram generated for the pre-monsoon season has three distinct clusters. The first cluster is represented by EC, TDS, HCO3 and Na+, which shows silicate weathering and ion exchange resulting in Na-HCO3 type groundwater. The second cluster is represented by Ca2+, PO43−, NO3 and pH. The strong loading of NO3 in PC3 during pre-monsoon together with PC3 and PC4 represent agrochemicals16,19. A single giant cluster is represented by K+, SO42−, Mg2+, Cl and ORP, which also points towards a different class of agrochemicals for the correction of potassium (K2SO4) and magnesium (MgCl2) deficiencies in the soil. In the post-monsoon season three clusters were identified; the first group consists of two sub-clusters; the first being EC, TDS and HCO3, and the second being Ca2+, Mg2+ and Cl.

As explained in the PCA, this group represents the dominance of carbonate weathering during the post-monsoon with the secondary involvement of reverse ion exchange. The second cluster consists of SO42− and NO3. Although the correlation matrix does not reveal any trend between the two but the HCA offers a more transparent look at the possible occurrence of wet deposition of these anions during the rainy season. Darbhanga district is home to several industrial estates which could be linked to this problem. The last cluster consists of two sub-clusters; one consisting of pH and Na+, and the other consisting of ORP, PO43− and K+. This implies high sodic nature of the soil along with agrochemical usage during the post-monsoon season.

Probabilistic exceedance and seasonal occurrences

The primary objective behind mapping the probability exceedance of critical parameters is to validate their response with a physical/chemical process change. Figure 7 correctly identifies the arsenic, HCO3, pH, NO3 and SO42− anomalies existing during different seasons. The monsoon probability trends are steeper and more distant as compared to other season trends (Fig. 7a). This validates the role of rainfall recharge when we see the complementing trends of SO42−, HCO3 and pH. It could also be stated that arsenic in the concentration range of 8−5 µg L−1 is much more susceptible to a physical process such as precipitation-led recharge mediated primarily through SO42− (Fig. 7). pH in the range of 6.5 and 7.5 can also be validated as a contributing factor when there is a shift in the HCO3 concentration between 150 and 350 mg L−1.

Fig. 7: Exceedance probability of various parameters impacting arsenic concentration.
figure 7

Corresponding change in exceedance values of (a) arsenic and (b) HCO3 validates the role of rainfall recharge in arsenic mobilization process. Factors like (c) NO3, (d) pH, (e) TDS and (f) SO42− show isolated yet confirmatory signatures of anthropogenic intrusion.

The presence of arsenic above 5 μg L−1 has been mapped in terms of a probability for all the three seasons (Fig. 8). The Akaike information criteria (AIC) for monsoon is 98.88, which signifies that the model has retained relatively higher information of the original data while fitting (Supplementary Table 4). In terms of model fitting, a comparison between the three seasons show that pre-monsoon spherical model is the best fit model (AIC = 66.79) (Supplementary Figs. 46). However, apart from model fitting, the capability of prediction of the pre-monsoon spherical model is relatively weak compared to monsoon Gaussian model (RMSS = 0.976; RMSE = 0.087). The post-monsoon arsenic model performs relatively weak as compared to pre-monsoon and monsoon season models, as the variance in the datasets diminishes greatly. A comparative analysis of standard deviation shows that pre-monsoon models have a peak distribution and less spread (σ = 1.51) as compared to other season models further validating its fitting capabilities. The increase in the spread of region with high arsenic concentration during the monsoon season after the pre-monsoon shows that rainfall-led dilution is having its impact in the transportation of arsenic. The drainage and the general lineament of the region play a role in transporting arsenic in the northward’s direction.

Fig. 8: Probability distribution of arsenic above 5 µg L−1.
figure 8

There is a marked difference in the suitability of groundwater when arsenic is considered as one of the parameters in WQI.

The very fact that the probability of occurrence of arsenic above 5 μg L−1 shrinks to 0.5 from above 0.8 during post-monsoon shows that arsenic is being stabilized in the bed and the presence of iron-oxy-hydroxides (Fe-OOH) has become marginalized as the concentration of Fe reaches to almost zero37. Using a GIS-based overlay technique, the study region has been mapped for the Water Quality Index (WQI) (Supplementary Table 5). A WQI below 50 is considered excellent with regards to drinking standards and a WQI between 50 and 100 is regarded as very good. The entire region though falling in excellent to the very good category has geogenic arsenic contamination as the worrying factor. The impact of consuming water contaminated with arsenic is much more severe as compared to consuming water with excessive TDS. Therefore, through spatial distribution mapping, it has been tried to understand the areas where arsenic remains a potent challenge in changing the water quality dynamics. The results present a concerning reality where it has been found that arsenic is a dominant factor which could change the WQI from 25 to 35% in almost the entirety of the study area during pre-monsoon, even if its ability to cause mutagenicity and carcinogenicity is ignored. At small places in the extreme north and the central of Darbhanga district the percentage rise to almost 35%.

Conceptual modelling and its implications

By following an integrated approach involving all the analytical and statistical techniques, we have developed a conceptual understanding of the region vis-à-vis the operative processes active for arsenic mobilization (Fig. 9). It should be noted that our study was conducted under three economic settings, i.e. (1) agriculture, (2) urban and (3) rural, which differ significantly in terms of arsenic susceptibility and groundwater exploitation. As opposed to the popular belief, the agriculture region showed less arsenic (<10 µg L−1) which shows that the arsenical pesticides are not used as a possible mitigation strategy against crop weeds. Also, it shows that monsoon-led irrigation is still prevalent and the pressure on groundwater exploitation has not reached the levels of West Bengal and Lower Gangetic Plains. The high fertility of the alluvium sediments has also caused less exposure of fertilizers as the concentration of nitrate and phosphates are mostly within the permissible levels. However, the presence of arsenic above 10 µg L−1 in the urban setting shows that the threat of arsenic contamination can loom quickly if over-exploitation of groundwater takes place continuously. A significant problem is also associated with lowering of pH due to high concentration of anion such as sulfate, which may make the conditions optimum for arsenic adsorption over iron-oxy-hydroxides. This is a big challenge for regions where fewer checks and balances are put in place in terms of curbing industrial effluxes into the open system. The rural region which is characterized by a mix of agricultural and commercial activities may prove hotbeds of arsenic toxicity as could be seen through the trends obtained in different seasons (Fig. 8). Also, the high undersaturation of arsenolite and claudetite (~−15) (Fig. 5) is a worrying cause as activation of arsenic breakdown can occur through micro-organism if triggers like nitrate and sulfate are not restricted for their intrusion in the groundwater.

Fig. 9: Conceptual model of the operative processes active in the study region.
figure 9

The governing hydro-chemical processes exhibit significant deviations due to leaching of Cl, SO42− and other pH affecting chemicals.

As far as the hydro-chemistry of the groundwater is concerned, silicate weathering was found to be prominent during the pre-monsoon; however, it would be important to point out here that the K+ in the groundwater could not be traced to silicate weathering as was seen from the low values as well as the lack of correlation between K+ and Na+. Based on the previous studies, it could be inferred that soda feldspar or albite are the dominant minerals in the aquifers of the study area as the soils of the Ganga plains are sodic in nature. This has further been supported by both the PCA and the HCA. MVA also revealed the cyclic change from ion exchange during pre-monsoon to reverse ion exchange during the post-monsoon. As post-monsoon advanced, carbonate weathering was found to take precedence over silicate weathering leading to higher concentrations of Ca2+ and Mg2+. Statistically significant correlations of Ca2+ and Mg2+ with Cl also highlights the dominance of reverse ion exchange during post-monsoon. Overall, anthropogenic forcing on the groundwater quality was found to be marginal and mostly consisted of agrochemical (dominant being calcium nitrate and calcium phosphate fertilizers) usage and isolated incidences of chemicals like bleaching powder. However, one of the key findings of the HCA analysis was the possibility of wet deposition of the anions (SO42− and NO3) during the post-monsoon season. Eventually, the present study clearly displayed the crucial role played by MVA; not only were these techniques supportive of the bivariate analyses but in several incidences where the observed trends were unclear, the MVA offered a much clearer explanation (Supplementary Fig. 7).

Due to the high population density and agriculture intensiveness of the region, excessive abstraction of groundwater and its contamination due to nutrient leaching from agricultural inputs go hand in hand. The detection of arsenic above 10 µg L−1 at only 50 m depth signifies that the situation can be difficult to manage if a policy limiting the groundwater abstraction is not put in place. Further, changes in pH and ORP over seasons signifies that the industrial effluents during monsoon are also finding their pathway in the aquifer system.

Regions with high risk of arsenic contamination are also regions with high population density; therefore, there will be risk for the vast population to develop cancer through both dermal and oral exposure. At a policy level we, therefore, suggest that there is a dire need to create hazard buffers where the activities of groundwater extraction could be limited as well as the industrial extrusion be quantified and checked.

Methods

Study area

Darbhanga district in Bihar extends between 85o40′ E and 86o10′ E latitudes and between 26o30′ N and 26o0′ N longitudes (Fig. 1) under the Ganga floodplains of the Bhagmati sub-basin. The lineament of the subsurface of the study region is straight with the geomorphological origin being of fluvial nature (https://bhuvan.nrsc.gov.in). The region has very less variation in the elevation from the mean sea level, with the terrain being flat and consisting of vast expanses of alluvial plains. The soil in the study region consists of a mixture of sand, silt and clays (calcareous) of different proportions1. The general drainage of the study region slopes slightly towards the south with a slight depression in the elevation at the centre of the study region (ASTER DEM 30 m spatial resolution https://earthexplorer.usgs.gov/). Despite having a tropical three-season climate (summer, winter and monsoon), the summer season predominates. May is the hottest month with temperature recorded above 40 oC. The region receives almost the same rainfall (1142.3 mm) as the average rainfall received by India (1200 mm) with 90% of the rainfall demand being met during the monsoon season (The National Oceanic and Atmospheric Administration (NOAA), https://www.noaa.gov/). The aquifers that consist of water-bearing sub-layers are of quaternary sediment origin. The total width of these formations at places reaches to 2 kilometers.

Sampling strategy and laboratory analyses

A total of 171 groundwater samples were collected during the pre-monsoon (57—during April 2011), monsoon (57—during June 2011) and post-monsoon (57—during January 2013). Collection was done in pre-washed polyethylene bottles after pumping the tube wells for 5−10 min. Raw samples were used for anion analyses, and filtered through Millipore (0.45 µm) and acidified to a pH ≤ 2 with HNO3 (2 M ultra-pure grade, Merck, India) used for cation analyses. Fourteen water quality parameters were analysed, out of which HCO3, pH, EC, TDS and ORP were analysed in situ using Hanna-Multi-parameter Water Quality portable metre (H19828). Na+ and K+ were analysed using flame photometer (Systronics-128), and Ca2+ and Mg2+ using ICP-OES (Perkin Elmer Optima 2100 DV). Chloride was analysed using argentometric methods38, while SO42−, PO43− and NO3 were analysed using UV Spectrophotometry (Shimadzu UV-1700)38. Arsenic concentration was determined using AAS (Thermo-Scientific Spectrometer 3000). Data validation and accuracy of the analytical methods was checked using normalized ionic charge balance. Bivariate plots were prepared using Microsoft Excel 201639 (Microsoft, Redmond, Washington, USA). SPSS 20 (IBM Armonk, New York, USA) was utilized for generating the correlation matrix. Arc-GIS 10.3 was used for spatial mapping and generating probabilities of spatial arsenic occurrence.

Saturation index

The saturation index (SI) is defined as the ratio of Kiap (ion activity product) and Ksp (solubility product). In the present study, the SI of anhydrite (CaSO4), gypsum (CaSO4.2H2O), halite (NaCl), epsomite, calcite, and dolomite was studied for the three seasons. For which, first of all, the ionic strength (I) was estimated using the molality (m) and valency (Z), as per Eq. 120.

$$I = 0.5 \times \mathop {\sum }\limits_{i = 1}^n m \times Z^2.$$
(1)

Using the value of ionic strength, the activities of the individual solute species were computed by Eq. 2.

$${\mathrm {Activity}} = m \times a,$$
(2)

where a is the activity coefficient estimated using the Debye−Huckel equation

$$\log\,a = \frac{{ - A \times Z^2 \times \sqrt I }}{{1 + ao \times B \times \sqrt I }},$$
(3)

where ao is the ionic radius and A and B are constants dependent on temperature and pressure.

Probability distribution of arsenic

The concentration of arsenic was predicted for different locations using kriging40. Measured As concentration of each location Z (x0, x1, x2, x3……) along with their spatial correlation was used as predictors. Further by associating a weight λ based on spatial arrangement and the distance between the sampling points (Eq. 4), an empirical fitting was done41.

$$Z\left( {So} \right) = \mathop {\sum }\limits_{i = 1}^N \lambda i \times Z\left( {Si} \right),$$
(4)

where Z (So) corresponds to the location at which prediction must be done. i is the number of sampled locations varying from 1 to 19 for each season. λi is the associated weight, the value of which has been determined by fitting a suitable variogram model and Z (Si) is the value of concentration at a known location42. In order to generate a prediction surface, an empirical semi-variogram was estimated based on the different relationships between the various pairs of sampling points (Eq. 5)43.

$$Empirical\,semi-variogram\left( {distance} \right) = 0.5 \times average\,(\left( {vi - vj} \right))^2.$$
(5)

In the next step we fitted a modelled semi-variogram to the points which form the empirical semi-variogram, which was used for computing three parameters namely sill, nugget and range. For estimating the probability of As concentration, two variables namely (i) indicator code (I) and uniform value (U is also known as standardized rank) are defined as the main and the auxiliary variable respectively44.

$$U\left( x \right) = \frac{r}{n},$$
(6)

where r denotes the rank and n is the number of observations (19 points representing 57 samples for pre-monsoon, monsoon and post-monsoon each). In terms of uniform value U(x), Eq. 4 can be further modified as given by Eq. 7.

$$\gamma \left( h \right) = \frac{1}{{2N(h)}}\mathop {\sum }\limits_{i = 1}^{N(h)} [Ux - Ux + h]^2.$$
(7)

For different pairs of observation points, the cross semi-variogram can be estimated as follows.

$$\gamma c\left( h \right) = \frac{1}{{2N(h)}}\mathop {\sum }\limits_{i = 1}^{N(h)} \{ \left[ {Ix - Ix + h} \right] \ast \left[ {Ux - Ux + h} \right]\} ^2.$$
(8)

The final estimation of probability is based on Eq. 9 (modifying Eq. 8)

$$Pn = \mathop {\sum }\limits_{i = 1}^n I\left( x \right) \times \lambda i + \mathop {\sum }\limits_{i = 1}^n U(x) \times \lambda u,$$
(9)

where λi and λu are the respective weights assigned to the indicator code (Ix) and uniform value (Ux) function respectively. In order to accurately predict the concentration values, the variance is kept minimum and the weight factors λi and λu are solved by Lagrange constraint method under un-biased conditions assuming ∑ λi = 1 and ∑λu = 0 by Eqs. 10 and 11.

$$\mathop {\sum }\limits_{j = 1}^n \lambda ij \times \gamma \left( {xj - xi} \right) + \mathop {\sum }\limits_{p = 1}^n \lambda up \times \gamma c\left( {xp - xi} \right) = \gamma \left( {xo - xi} \right) - \mu i,$$
(10)
$$\mathop {\sum }\limits_{j = 1}^n \lambda ij \times \gamma c\left( {xj - xi} \right) + \mathop {\sum }\limits_{p = 1}^n \lambda up \times \gamma u\left( {xp - xi} \right) = \gamma c\left( {xo - xi} \right) - \mu u,$$
(11)

where μi and μu are the Lagrange multipliers, γ and γc are the semi-variance and cross-variance of I(x) and γu is the semi-variance of U(x)39.

A threshold concentration value of 5 μg L−1 is selected for As-mapping distribution zones based on probability ranging from 0 to 1. The probability distribution maps are demarcated into various zones and corresponding change in the zones is studied for the three seasons, i.e. pre-monsoon, monsoon and post-monsoon. Usually the datasets are not normally distributed; therefore, normal score transformation is applied where the data are structured like standard normal distribution45. It is important to note that the present study has been designed specifically to understand the potential of monsoon recharge in alleviating the arsenic concentration in the groundwater. The role of anthropogenically induced heavy metal toxicity has not been accounted for based on the understanding of the region that dominantly depends on agriculture for its livelihood. Therefore, the role of metals like Pb, Hg, Cd, etc., which may have a role in arsenic cyclicity, does not come under the scope of this study.