Groundwater depletion causing reduction of baseflow triggering Ganges river summer drying

In summer (pre-monsoon) of recent years, low water level among the last few decades, has been observed in several lower Indian reaches of the Ganges (or Ganga) river (with estimated river water level depletion rates at the range of −0.5 to −38.1 cm/year between summers of 1999 and 2013 in the studied reaches). Here, we show this Ganges river depletion is related to groundwater baseflow reduction caused by ongoing observed groundwater storage depletion in the adjoining Gangetic aquifers (Ganges basin, −0.30 ± 0.07 cm/year or −2.39 ± 0.56 km3/year). Our estimates show, 2016-baseflow amount (~1.0 × 106 m3/d) has reduced by ~59%, from the beginning of the irrigation-pumping age of 1970s (2.4 × 106 m3/d) in some of the lower reaches. The net Ganges river water reduction could jeopardize domestic water supply, irrigation water requirements, river transport, ecology etc. of densely populated northern Indian plains. River water reduction has direct impact on food production indicating vulnerability to more than 100 million of the population residing in the region. The results of this study could be used to decipher the groundwater-linked river water depletion as well as the regional water security in other densely populated parts of the globe.


S1 Study locations S2 Gravity Recovery and Climate Experiment (GRACE) calculations
S3 Baseflow simulation using PCR-GLOBWB S4 River reach response analyses by groundwater flow modeling S5 Hydrogeochemical and isotope modeling S6 Non-parametric statistical analysis S7 Food security estimates

S1 Study locations
We have selected eight major Ganges river reaches in its middle and lower course and their adjoining aquifers in India, in which various hydrogeological and hydrogeochemical parameters were sampled and/or measured. The river water is sampled at 22 locations (Fig. S1). Groundwater samples (n=227 for stable isotope analysis; n=560 for hydrochemical reaction modelling) are collected from the aquifer locations adjoining the river channel. The black filled circles are representing the locations for river water sampling (n=22). The groundwater sampling locations are shown through dashed rectangles. The reaches are selected based on the confluence points of major tributaries. All the maps were made using QGIS software (version 2.12) and standard graphical illustrators

S2 Gravity Recovery and Climate Experiment (GRACE) calculations
The mascon solutions are derived from the change in distance between two GRACE satellites between January 2003 and December 2014. Recently released RL05 mascon solutions are obtained from the Jet Propulsion Laboratory, National Aeronautics and Space Administration (NASA) in order to get terrestrial water storage (TWS) change information (Watkins et al., 2015;Bhanja et al., 2016). Several techniques are applied on the data for obtaining TWS solutions (http://grace.jpl.nasa.gov/data/get-data/jpl_global_mascons/ accessed on 26 April, 2016). Degree 2 and order 0 coefficients in the data are replaced by the coefficients derived from Satellite Laser Ranging reported by Cheng et al. (2011). Degree 1 coefficients are derived from a process described in Swenson et al. (2008). Post glacial rebound signal in the data are removed by the process developed by Geruo and Wahr (2013). The entire globe is divided into 3 0 equal area spherical mass concentration blocks in the mascon approach (Watkins et al., 2015). Introduction of the a priori information lead to removal of correlated noise (stripes), therefore, post-processing filters are not required to apply on the data-this is the major improvement from the spherical harmonics approach in terms of reducing inherent errors (see the details in Watkins et al., 2015).

S3 Baseflow simulation using PCR-GLOBWB
The global-scale hydrological model, PCR-GLOBWB, have been used to simulate baseflow at a resolution of 0.1 0 × 0.1 0 (Wada et al., 2014). The PCR-GLOBWB model has three soil layers (Fig. S5), the first two layers are not directly connected to groundwater. The third and deepest soil layer in PCR-GLOBWB model is considered to be a groundwater reservoir replenished by active recharge from the upper layers (Wada et al., 2014). Groundwater recharge is calculated after removing capillary rise from the deep percolation. The baseflow (Q b ) is characterized as drainable water from the reservoir to the river. A first order linear reservoir approach has been implemented in the model to compute groundwater storage and baseflow (Wada et al., 2014). In the model, baseflow is a product of groundwater storage (GW S3 ) and recession coefficient (Wada et al., 2014). The detailed methodology can be found in Wada et al. (2014). Recession coefficient, J, represents average residence time of groundwater. Global reservoir coefficient values are computed by combining global drainage length and lithology information (Wada et al., 2014). Finally, local runoff is computed as a sum of direct runoff, interflow and baseflow. The groundwater layer of PCR-GLOBWB has been represented using a 1-layer MODFLOW model (Wada et al., 2016). The MODFLOW model is infused with groundwater recharge and the river discharge data from the PCR-GLOBWB data (Wada et al., 2016). River and drain package of MODFLOW has been used to model the groundwater and surface water interaction (Wada et al., 2016). Three different types of rivers and streams are considered to compute the river discharge: large rivers (width >25 m), small rivers (width <25 m), and the springs and streams (Wada et al., 2016). The following equations are solved to simulate baseflow in different time period (Wada et al., 2014): where, k sat,S3 represent the hydraulic conductivity of the saturated layer or aquifer (S3); f d represents the drainable porosity; aquifer depth is represented by D c ; drainage length is represented as B c . B c is estimated through drainage density analyses. The drainable porosity and saturated hydraulic conductivity are linked to a simplified lithological map of the world by Durr et al. (2005). A global map of the reservoir coefficient have been obtained after solving the equation: Parameter sensitivity analyses of D c and k sat,S3 have been provided in Sutanudjaja et al. (2011). Local runoff (Q l ) has been estimated as the combination of the direct runoff (Q d ), baseflow (Q b ) and the interflow (Q i ).
River network has been used to route the local runoff.

S4 River reach response analyses by groundwater flow modeling
Discretization: The model domain was discretized at a grid resolution of 1000 m (x)  1000 m (y)  15 m (z) for 22 layers from land surface to a depth of 300 m below mean sea level. SRTM 90 DEM with horizontal resolution of 3 arc-sec/90 m; vertical resolution of 1 m was used for surface elevation. Spatial location data were projected to Universal Transverse Mercator (UTM 1983), Zone 45 (central meridian 87, spheroid GRS 80, reference latitude 0, scale factor of 0.9996, false easting 500,000, false northing 0). The coordinates of origin of the modeled area for both type of modeling were easting 600,000, northing 2,380,000. The modeled study area is bounded in the north, west, and east by the Rivers Ganges, Bhagirathi-Hoogly and Jalangi/Ichamati, respectively, and to the south by the Bay of Bengal. The study area is 335 km in length and 110 km in width.
Recharge: The total recharge inflow in the study area was calculated from monthly climatic data obtained from 25 observations stations in and near the study (1811 to 2013) obtained from the NCDC database (http://www.ncdc.noaa.gov/oa/pub/data/ghcn/v2/ghcnftp.html, accessed on January 20, 2015). Based on these data, potential evapotranspiration (PET) values were estimated for each location according to Malmstrom (1969): T is average monthly temperature (°C).
Evapotranspiration (ET) values were calculated (Pike 1964) for each of these 25 locations as: where, P = average precipitation (mm/month) PET = potential evapotranspiration (mm/month) Potential recharge (R g ) was then calculated as: The monthly R g values were aggregated according to the previously described seasons. These values were then used to form 2-D daily R g grids of the study area by kriging in Surfer, where where R s = cumulative mean monthly rainfall for that season (in mm) d s = number of days in that season Boundary conditions and surface water: Boundary conditions were assigned for headdependent flux boundaries as specified head for the rivers in the area e.g. Bhagirathi Hoogly (Ganges river), Ichamati river, Churni river, Matla river etc.. Constant-head for the Bay of Bengal was used following the natural physiographic and hydrologic conditions existing in the area, with a buffer ≥ ~20 km on all sides from the domain boundary. The groundwater-river water interaction was simulated by the river package (RIV). Scarcity of surface water data limited invocation of more sophisticated routines. The rivers were incorporated as specified head boundaries, divided into 11 reaches (1 reach for Ganges towards Bangladesh and 4 reaches for the Bhagirathi-Hoogly channel of the Ganges) and were assigned attributes like channel width (W) and length (L), stream stage, channel bed elevation and thickness (T), and sediment type based on data collected from the field or from various sources. Conductance (C) data of river beds for the study area, C values were calculated as (in m 2 /day): where K is the hydraulic conductivity of the bed sediments. The Bay of Bengal was set as a constant-head (0 m) boundary, which was assigned to the southernmost rows at the edge of the study area along the coastline (low tide line).
Hydraulic Parameterization: The hydraulic heterogeneity was incorporated from a fullydeveloped three-dimensional, finite-difference grid, block-centric lithologic model. Based on the acquired lithologs (269 logs of existing drinking water, irrigation and observation wells), a lithologic model was developed. The lithologic solid model was created using a block-centric finite-difference grid. The solid model was developed to a depth of 300 m below MSL and the topographic elevation was defined by the SRTM 90-DEM grid. The gridding was done by eight nearest-neighbor methodology with 3-D interpolation by average minimum distance. The lithologies in the logs were grouped into three hydraulic types: gravel or sand (aquifer), sandy clay (low hydraulic conductivity aquifer) and clay (aquitard). The sensitivity of the model was tested by varying the horizontal and vertical spacing of the nodes and an optimized final model was constructed that was least sensitive to changes in spatial resolution (i.e., smaller grid sizes). The resolution of the final model was 1000 m (x)  1000 m (y)  2 m (z). The resulting discretization consisted of 135 x nodes  335 y nodes  165 z nodes, thus having 7,462,125 solid model nodes, each with a voxel volume of 2,000,000 m 3 . Four nodes were assigned around each interpolated datapoint. Transmissivity (T) values of sediments were calculated iteratively, using the method of Mace (2001), from 132 specific capacity tests in the study: In this method, T i is the initially assumed and T f is the final T value, S c is specific capacity, t p is duration of pumping, r w is the well radius, and S is storativity (set to an average of 0.03 46 ). The average hydraulic conductivity (K x ) was calculated from the T and modeled aquifer thickness for the four districts, and K z assumed equal to 1/10 K x . Initial Porosity was assigned as 0.2. The final values of the aquifer parameters were ascertained by inverse modeling and model response to sensitivity analyses. The layers were allowed to have seepage from the top and leakage through the base, making them hydraulically connected. The top-most layer was defined as unconfined and the second layer as unconfined (T varying). The rest of the layers were defined as confined, as the water table was not expected to fall >30 m. Abstraction and simulation: Groundwater abstraction was simulated from shallow wells (STW) with low to medium yield (10-20 m 3 /hr) and high-yield (up to 200 m 3 /hr), deep irrigation tube wells (DTW) by electric (18-20 h.p.) turbine or submersible pumps. Their current estimated numbers for each reach for each year till 2007 was obtained from MoWR (1993MoWR ( , 2001MoWR ( , 2005MoWR ( , 2013. The pump numbers for recent years were obtained from informal information. The pumpage for each model grid cell was calculated by following the equation: Here, Q = the assigned total pumpage (outflow) value to each grid cell for a specified time (L 3 ); A = area of each district within the model boundary (L 2 ); N = number of pumps; r = pumpage rate (L 3 /T); t = hours of operation per day (T); a = area of each grid cell. Assuming the rates of population growth similar to growth rates between 1991-2011, projected number of pumps and pumping rates for 2017-2050 for each reach were calculated according to the following equation: Where N= projected number of pumps; η = number of pumps per capita in 2011. Response to adjoining river reaches were calculated follow Cosgrove and Johnson where L j is rate of depletion of the river reach j and Rj is the response function.
Inverse modeling was performed for parameter estimation for hydraulic properties and recharge using recent 197 hydraulic-head values. In the inverse mode, initial values of aquifer parameters were based on the values obtained from previous studies or field measurements. The values obtained at the end of the analyses were used for forward simulations and determining the calibration statistics. Sensitivity of hydraulic heads in target wells to model parameters were also analyzed by sequentially varying the hydraulic properties, recharge, stream bed conductance, stages of major streams, and groundwater abstraction.
Calibration and sensitivity analyses: Because water level data were only available for the study area for pre-monsoon in presence of pumping, the PREM-p model was calibrated in steady state against known head values from head values observered from more than 120 locations across the study area. Sensitivity analyses were performed on this calibrated model (PREM-p) by sequentially varying the K x-y of aquifers and aquitards, S and n values. The model was found to be most sensitive to the K x values of dominant lithologies. Using the selected values from the analyses, the final model was recalibrated and found to have good agreement with present water level and hydraulic-head contours. The differences between the observed and computed values were probably a cumulative manifestation of a specified head boundary in the vicinity, discrepancy in topographic elevations, exclusion of STW, and uncertain aquifer parameters. The finalized values obtained from the sensitivity analyses were then used for defining the aquifer properties of the scenario simulations

S5 Isotope and hydrogeochemical modeling
In order to find out the percentage contribution of groundwater in Ganges river water, a three component isotope mixing model has been used to characterize the fractions of glacial/ice including snow melt (ICE), surface runoff (SR) and the groundwater discharge/baseflow (GW) component. Isotopic composition (d 18 O) and electrical conductivity (EC) values for ice/snowmelt water in higher reaches of Ganges river are collected from Lambs (2000). The relative contribution between glacier and snow melts are not considered separately due to comparatively lesser contribution of both of the components in middle to lower reaches, we used the average value. Isotopic composition and EC values of SR are measured from the samples collected from the major tributaries during pre-monsoon time. Samples are collected from the sites located just before the confluence zone in the tributary. In this process, the SR component would also contain the ICE fraction particularly in Yamuna, Ghagar, Kosi, Fulahar rivers. However, in consideration of conservation of mass, we believe this is the best way for estimating the GW fraction. The d 18 O and EC values from groundwater samples are collected from 227 locations in the periphery of the Ganges river. The unavailability of river water samples corresponding to all of the groundwater sampling locations led us to use the nearest available river water isotope data of the groundwater sample locations. There are two major assumptions: 1. There are no other components available except those three components, ICE, SR and GW. Total mass of the river water (T) can be represented using mass of the individual components: 2. The average values of d 18 O and EC are not changing in the entire season.
The following equations are solved to calculate the fractions 15 .
The sum of fractions of the three components is equal to 1 or 100%.
ice + sr + gw = 1 ice .d ICE + sr .d SR + gw .d GW = d T ice . E ICE + sr . E SR + gw . E GW =E T Solving these three equations provide the individual fractions, shown below: The main objective of the modeling exercise was to characterize the geochemical pathways and processes that accounts for the chemical changes leading to hydrogeochemical evolution of the shallow groundwater because of mixing with inflowing Ganges river water reaches from the vicinity. Ion activities and saturation indices (SI = log(IAP K T -1 ), where IAP is the ion activity product and K T is the equilibrium solubility constant of a phase at ambient temperature) were calculated using PHREEQC (Parkhurst and Appelo, 1999)  Models for probable, general geochemical mass-transfer reactions along potential groundwater flow paths in the Gangetic aquifers have been developed in this study. The models were generated for our testing the major hypothesis the Ganges river water is inflowing in the aquifers adjoining reaches that area previously known to be gaining. In inverse modeling, the groundwater flow system and chemical composition of the aquifer is assumed to be steady-state conditions. The approach also assumes that the initial and final end member waters that are used in the simulations are representative groundwaters from the same, or convergent flow path(s), and accordingly the minerals employed in the model calculations must occur within the aquifer matrix. Further, it is presumed that the effects of hydrodynamic dispersion along the modeled flow paths are negligible (Plummer et al., 1983;Zhu and Anderson, 2002;Johannesson and Tang, 2009). Ion-exchange reactions (between CaX 2 , MgX 2 , NaX and KX) are modeled following the Gaines-Thomas convention (Gaines and Thomas, 1953) and equilibrium constants (Appelo and Postma, 1993;Fryar et al., 2001;Mukherjee and Fryar, 2008): where X represents the solid phase exchanger. Following procedures of Fryar et al.(2001), Mukherjee and Fryar (2008) and Raychowdhury et al.(2014), minimal reaction-path (inverse) models were developed using PHREEQC (Parkhurst and Appelo, 1999) version 2.17 for pairs of wells located along possible flowpaths, as suggested by previous studies (Mukherjee et al., 2007;Mukherjee and Fryar, 2008;Mukherjee et al., 2012). The selected pairs of wells used as end members are only based on their distinctiveness in hydrochemistry, and they are located up gradient-down gradient to each other along suggested general flow direction. The inaccuracy in the choices of end-member waters and in analytical data was handled by including uncertainties (25%, incrementally increasing from an initial 5% default) in the inverse modeling. The importance of this exercise is for testing of hypothesis for possible, "general" type of groundwater-surface water reactions, regionally, within the study areas and not specific to any particular flow path or pairs/sets of wells. A total of 76 different inverse geochemical models in the eight aquifers were executed within the acceptable uncertainty range, resulting to 1377 solutions. All of the modeled reaction flow paths were designed for evolution through silicate weathering (quartz, amorphous silica, feldspars, clay minerals etc.), cation exchange (Ca 2+ /Mg 2+ ,Ca 2+ /Na + , Ca 2+ /K + , Na + /Mg 2+ , and Na + /K + ) and carbon cycling (CO 2 dissolution and exsolution; CH 2 O (organic carbon) oxidation) or combination of these along each flow path in order to explain the gross hydrochemistry. The major hydrogeochemical reactions that were considered in the models include: In natural systems, carbonate mineral dissolution can be written as (Garrels and Mackenzie, 1971): Reactions involving silicate weathering by incongruent dissolution can be generalized by following Appelo and Postma (1994), Drever (1997) and Faure (1998)  The reaction for cation exchange can be generalized as (following Karanth, 1994): 0.5Ca 2+ + Na-clay  0.5Ca-clay + Na + The models were balanced on Clconcentrations (assuming Clas a conservative solute), and halite dissolution was used for 23 models as a proxy for mixing with connate water from the proto-Bay of Bengal. Because of uncertainty in the measurements of Al (not detected at most locations), and the fact that inclusion of alumino-silicate phases led to an impractically large number of preliminary model solutions, Al was not used in the final reaction-path models. Instead, Silica balance was used to evaluate the role of silicate weathering, as a Si-balanced model would indicate minimal Si influx from weathering, whereas a failure would suggest substantial silicate hydrolysis along the flow path.

S6 Non-parametric statistical analysis
The Hodrick-Prescott (HP) (Hodrick and Prescott, 1997) filter, a non-parametric trend estimator, has been used in this study for computing the trend analysis. The trends and cycles are separated in this approach upon solving the following equation (Hodrick and Prescott, 1997): Where T t+1 and T t-1 are the trend component at the time steps t+1 and t-1, respectively. The cyclical components can be obtained after removing the trends from its real values; also, the long-term mean of the cyclical component closes to zero (Hodrick and Prescott, 1997). Variability in cyclical components are reduced through the selection of a suitable value for the smoothing parameter () (Hodrick and Prescott, 1997). The selected value of  for quarterly data is 1600 (Hodrick and Prescott, 1997;Ravn and Uhlig, 2002). The trend values are estimated following Sen's slope method (Sen, 1968). The non-parametric Sen's slope approach calculates the regression co-efficient using Kendal's rank correlation estimates (Sen, 1968).

S7 Food security estimates
Total cereal production during Rabi season (cultivation during pre-monsoon time, cropping time October-March and growing time March to June) data are collected state-wise (Uttar Pradesh, Bihar, Jharkhand and West Bengal) during 2010 to 2013 (source: https://data.gov.in/, accessed on January 15, 2016). Total cereal productions during Rabi (winter) season are 41.2 to 44.1 million tonnes in the study area. Total cereal production during Kharif (monsoon) season (cultivation during monsoon time, cropping time July-October) has also been used to get the total annual cereal production (source: https://data.gov.in/). The baseflow component is reaching negative values or river capture condition in 2050 in most of the lower reaches with an average value ~0 (Fig. 3). Therefore the surface water quantity will be reduced as a function of near zero contribution from baseflow component in 2050. This will lead to reduction in surface water irrigation during pre-monsoon time. Assuming no change in total precipitation pattern, groundwater irrigation pattern, cropping area, crop type, crop yield and crop import and export, we have predicted the amount of total cereal production during pre-monsoon/Rabi season. Average daily energy requirement for an adult is assumed to be 2500 cal/gram (FAO, 2004), with carbohydrate contribution approximately 50%. Total dietary energy requirement from carbohydrates, has been estimated for the projected population in 2050. Total cereal production in 2050 has been estimated using a combination of the reduced total cereal during Rabi season and the unchanged total cereal production for Kharif season. Crop deficiency values are estimated for 2050 and reported as number of people affected. The flow-chart (Fig. S8) showing the steps followed for estimating the number of population effected.