Abstract
Protection of groundwater resources is essential to ensure quality and sustainable use. However, predicting vulnerability to anthropogenic pollution can be difficult where data are limited. This is particularly true in the Sahel region of Africa, which has a rapidly growing population and increasing water demands. Here we use groundwater measurements of tritium (3H) with machine learning to create an aquifer vulnerability map (of the western Sahel), which forms an important basis for sustainable groundwater management. Modelling shows that arid areas with greater precipitation seasonality, higher permeability and deeper wells or water table generally have older groundwater and less vulnerability to pollution. About half of the modelled area was classified as vulnerable. Groundwater vulnerability is based on recent recharge, implying a sensitivity also to a changing climate, for example, through altered precipitation or evapotranspiration. This study showcases the efficacy of using tritium to assess aquifer vulnerability and the value of tritium analyses in groundwater, particularly towards improving the spatial and temporal resolution.
Similar content being viewed by others
Main
Groundwater resources remain an important but poorly understood and managed resource for water security and sustainable development throughout much of Africa1,2,3,4. In the western Sahel and Sudanian savanna regions (roughly the area of west Africa between the humid coastal zone and the Sahara Desert), groundwater can be expected to take on ever-increasing importance due to stresses on water security brought about by climate change5 and some of the world’s highest population growth rates6. However, the supply of groundwater represents just one aspect, with water quality playing an equally important role in determining the sustainable use of groundwater.
To help protect groundwater resources now and in the future, aquifer vulnerability mapping7,8 is undertaken to better understand the aquifers and areas that are more susceptible to contamination from pollution at the surface9,10. Attention can then be focused on the human activities taking place in these areas11. However, recent recharge also indicates the possibility of the aquifer regularly being recharged, with implications for the sustainable use of the aquifer12,13,14.
One common way of estimating aquifer vulnerability has been to evaluate and combine various parameters such as water-table depth, aquifer matrix grain size and climatic conditions at well locations to define a theoretical aquifer vulnerability index15,16, with the DRASTIC method16 being widely used, which incorporates seven different hydrogeological parameters. While this type of approach can be applied effectively where there is a high concentration of wells, it is less useful where there are large spatial gaps in the data7.
Another possibility is to consider the presence of elevated concentrations of nitrate as an indicator of surface pollution having reached the aquifer. Nitrate is a rather conservative compound that is abundant in fertilizer and animal manure, as well as untreated human waste, that infiltrates into the ground17 and is relatively easily measured. However, this assumption does not apply to non-agricultural areas or those currently less impacted by human activities.
An alternative that avoids the drawbacks of these methods is to assess groundwater vulnerability using measurements of the radionuclide tritium (T or 3H) in groundwater12. Tritium has a half-life of 4,500 days (12.3 years)18 and is produced naturally in trace amounts in the stratosphere through the interaction of high-energy cosmic rays and nitrogen as well as through human activities involving nuclear reactions. Essentially all tritium immediately gets oxidized into tritiated water (HTO) and incorporated into the water cycle19. As tritium generally does not get produced below the surface of Earth (except for negligible amounts in some granites20,21), its presence in groundwater generally derives from rainwater infiltration and represents an ideal tracer of infiltration and groundwater recharge as well as surface water and groundwater interactions22,23.
Due to cosmic charged particles being drawn towards Earth’s poles, tritium gets formed in the stratosphere at greater concentrations at higher latitudes. It can subsequently enter the troposphere and water cycle particularly via the springtime breakdown of the tropopause at 30°–60° N, known as the ‘spring leak’, as well as through the formation of large cumulonimbus clouds that penetrate the stratosphere, also predominantly in the Northern Hemisphere24. Variations in the spatial distribution of naturally produced tritium in the troposphere also result from dilution by the oceans19 and the presence of continents25, over which tritium is recycled through evapotranspiration and precipitation and not returned to the oceans. Due to the minute concentrations in which it is found, tritium is reported in terms of tritium units (TU), defined as one tritium atom to 1018 hydrogen atoms.
In addition to the natural production of tritium through cosmic rays, human-created nuclear reactions contribute considerably to the amount of tritium present on Earth. In particular, above-ground thermonuclear tests have made the largest contribution of tritium to the troposphere and water cycle. From October 1952 through 1962, atmospheric tests by the United States, Union of Soviet Socialist Republics and United Kingdom injected about 600 kg into the atmosphere, and limited tests by China and France during 1967–1980 added about another 20 kg (ref. 26). However, due to tritium’s half-life of just 12.3 years, most of this ‘bomb’ tritium has now decayed such that the tritium now found in global precipitation is dominated by that produced naturally22,27,28, although this amounts to roughly the total tritium from present-day localized anthropogenic sources25,29,30,31,32. Anthropogenic sources of tritium may also exist locally, for example, from nuclear facilities or landfills33.
Whether aquifer vulnerability is estimated on the basis of a derived index or a tracer already present in the environment, such as nitrate or tritium, these vulnerability ratings are typically available only at point locations. To assess groundwater vulnerability at other locations where no direct data exist, interpolation among the available point data can be conducted. For example, Kriging has been used to create a map of groundwater 3H concentrations in South Africa, also incorporating some additional environmental parameters34. However, such spatial extrapolation is not feasible where few to no data points exist. A recent approach to overcome this is the use of machine learning with spatially continuous predictor variables, as has been previously carried out for various geogenic groundwater contaminants35,36,37. Other studies have applied this technique to map the concentration of nitrate in Germany38, the United States39,40 and the African continent41 to represent exposure to anthropogenic influences. However, such spatial machine-learning modelling has never before been applied with the concentration of tritium as the target variable, which has the advantage of being part of the water molecule itself and is thus distributed throughout the water cycle.
In this work, we use machine learning with groundwater 3H data and environmental predictor variables to develop an aquifer vulnerability map of the western Sahel. Further analysis is conducted with additional stable-isotope and other physicochemical data. The purpose of the prediction map is to help guide efforts to protect vulnerable groundwater resources as well as highlight areas with recent aquifer recharge, which has implications for their sustainable use and is relevant for the UN Sustainable Development Goals, in particular, indicator 6.3.2 on ambient water quality.
Results
After associating the localized amounts of tritium found in modern precipitation25 with each of the 1,234 groundwater measurements in the available dataset42, 50 measurements containing at least 130% of the 3H concentration in modern precipitation were removed (Methods). As these removed data points were distributed throughout the entire study area and do not appear to be clustered, they are probably not due to unique local conditions. Thresholds of 1/2, 1/3 and 1/4 of the values of 3H in precipitation were then applied to the groundwater measurements and are plotted in Fig. 1. (This and subsequent maps were prepared using ArcGIS43 and/or QGIS44.) The threshold of one-half of modern precipitation was generally used for all subsequent analyses and modelling.
Kendall rank correlations between the proportion of high 3H concentrations that is greater than one-half of 3H in modern precipitation and binned averages of other in situ physicochemical parameters as well as spatially and numerically continuous environmental parameters are listed and plotted in Supplementary Table 1 and Supplementary Figs. 1 and 2.
Tritium prediction model
A random forest (RF) model of tritium in groundwater in the Sahel was created using spatially continuous predictor variables, achieving an average area under the curve (AUC) of 0.80 on the validation data (other performance statistics in Supplementary Table 2). Partial dependence plots (PDPs) of the final and thus most constraining variables are shown in Supplementary Fig. 3. The aquifer vulnerability map produced by applying the model to the predictor variables is displayed in Fig. 2. This vulnerability map shows a tendency for more northerly areas (towards the Sahara Desert) to be less vulnerable, whereas more southerly areas (towards the humid east–west coastal zone) are generally more vulnerable. The importance of the 11 predictor variables used in the model is shown in Supplementary Fig. 4, with elevation coming out on top.
Modelling with depth and deuterium excess
Two additional RF models were created, each with an additional in situ parameter, depth and deuterium excess (D-excess), but keeping all spatially continuous model parameters the same as for the main model result in Fig. 2. The variable importance and PDPs of depth and D-excess are shown in Supplementary Fig. 5. Although the PDPs of both of these in situ parameters show a clear relationship with the presence of high tritium values in groundwater and depth having an importance about twice that of the next most important variable (Supplementary Fig. 5), overall neither of these models performed any better than the original model based on spatially continuous parameters alone. That is, the AUC, kappa and balanced accuracy of the model with depth were 0.80, 0.44 and 0.72 and for the model with D-excess were 0.80, 0.44 and 0.73, whereas the geospatial model (Fig. 2) had, if anything, a slightly better performance of 0.80, 0.45 and 0.73.
To further explore the depth dependence of the model, two separate models were also created using tritium concentrations from 0 to 50 m depth (n = 361) and >50 m depth (n = 337) (Supplementary Fig. 6). These models likewise used the same predictor variables as in the main model (Fig. 2 and Supplementary Fig. 6a). Note that the total number of 3H concentration data points with depth information (n = 698; Supplementary Fig. 6b) is considerably lower than that of the full dataset (n = 1,184; Fig. 1) used for the main model (Fig. 2).
Sensitivity of tritium threshold
While the main groundwater vulnerability (tritium) model in Fig. 2 is based on half of the concentration of 3H in local precipitation (Methods), further RF classification models were produced using tritium thresholds that were set by dividing the modern local tritium concentration in precipitation by either three or four (Supplementary Fig. 7).
Sub-regional models
In addition, smaller-scale vulnerability maps were created (Supplementary Fig. 8), focusing on the areas in and around Senegal, Burkina Faso and Lake Chad, which lie at the intersection of Cameroon, Chad, Niger and Nigeria. The same modelling procedure as for the Sahel-wide model (also defining high 3H by dividing the local 3H concentration in precipitation by two) was undertaken, incorporating only the 3H data points from within these areas. Further details are provided in Supplementary Table 3.
Predictor variables
Figure 2 shows that aquifer vulnerability tends to be lower in more northerly areas and higher in more southerly areas. This may be due, for example, to higher aridity (Fig. 3a), which is greatest in the north of the study area (Supplementary Fig. 9e). Similarly, the sandy subsoil of northern areas (Supplementary Fig. 9i) is negatively associated with high tritium (Supplementary Fig. 3), and the fraction of absorbed photosynthetically active radiation (associated with wetter environments) is positively associated with high tritium (Supplementary Fig. 3).
Elevation
For elevation, the most important variable in the model (Supplementary Fig. 4), the PDP indicates higher tritium for elevations above about 150 m (Fig. 3b). However, in this case, the relationship may be more coincidental than causal as the hydrogeologically relevant climatic and geological properties happen to coincide with sharp distinctions in elevation. The lithology classes most associated with the presence of high and low tritium concentrations are metamorphics and siliciclastic sedimentary rocks, respectively (Fig. 3c), which also correspond to the dominance of these rock types in Burkina Faso and Senegal (Supplementary Fig. 9h). Considering the available data, the metamorphic rocks are associated with a higher permeability, whereas the siliciclastic sedimentary rocks have a lower permeability45. As such, higher permeability offers a more intuitive explanation for the presence of younger groundwater than does elevation. Likewise, high tritium concentrations are associated with lower aquifer porosity (Fig. 3d), which is associated with the metamorphic rocks (Supplementary Fig. 9h).
Depth
It is reasonable to expect that deeper water should be older and that the spatially continuous variable of depth to water table46 would be an important predictor of tritium in groundwater. However, this parameter was not among the most important variables during the automated recursive feature elimination, and its correlation with high tritium proved to be weak and not statistically significant (Supplementary Fig. 2). This may be due to the global parameter of water-table depth46 not being well determined in the Sahel region. Nevertheless, the actual reported depth42, which may represent the vertical mixing and sampling of more than one aquifer, does show a strong negative correlation with high tritium (Fig. 3e), and its inclusion as an in situ parameter in a separate model confirms this. (Note that this model could not be used to create a vulnerability map due to the depth information being point data and not spatially continuous.) Despite depth having the greatest model importance and a strong inverse relationship with high tritium concentrations (Supplementary Fig. 5a), the overall model performance remained unchanged. This indicates that the variables used in the main model (Fig. 2) are able to constrain the distribution of tritium very well, despite all being surface parameters.
The 3H concentration data containing depth information (n = 698) were split into subsets of ≤50 m depth (n = 361) and >50 m depth (n = 337) and used to make two respective models (Supplementary Fig. 6c,d). Although the performance of these models (AUC = 0.72 for ≤50 m depth and AUC = 0.83 for >50 m depth) differs from that of the main model result (AUC = 0.80), the patterns of high and low vulnerability are very similar, with more-vulnerable areas in the south, including Burkina Faso and surrounding areas, and less-vulnerable areas in the north, including most of Senegal (Supplementary Fig. 6a,c,d). This indicates that despite the main model incorporating data from all depths, it is generally valid for shallower depths (<50 m) as well as greater depths (>50 m). Incidentally, the available data are representative of the depths typically used in each respective area42. Furthermore, the more extensive area of applicability (AOA) of the main model (Supplementary Fig. 6a versus Supplementary Fig. 6c,d) is probably a consequence of it using more data (n = 1,184).
D-excess
Another analysis was carried out with the parameter D-excess (Methods), which is known to be strongly correlated with precipitation recycling in the tropics47. In this analysis, D-excess had relatively low importance, although its PDP shows a clear inverse relationship with high tritium (Supplementary Fig. 5b). These results suggest that more cycles of precipitation and re-evaporation before infiltration lead to greater groundwater residence times in the region. This is consistent with the lower modelled vulnerability in the drier, northern areas of the study region, which are also further from West Africa’s humid east–west-trending coastal zone. Furthermore, the two water isotope components of the D-excess calculation, δD and δ18O, are also correlated strongly (τ ≈ 0.8) with the exceedance of the locally determined threshold of high tritium (Supplementary Fig. 1). Aside from being expected from a hydrological perspective, this confirms that the method of determining the binary (high and low) values of tritium is effective.
Despite being point data, the strong dependence of tritium on depth as well as the association with D-excess may still be captured by some combination of the other spatially continuous predictor variables, which can act as proxies for these parameters. For example, greater aridity will generally lead to a lower water table, and higher elevations in West Africa are found further from the coast (Supplementary Fig. 9f), allowing for the possibility of more cycles of precipitation and evaporation before groundwater infiltration.
Choice of tritium threshold
To test the sensitivity of the modelled aquifer vulnerability to how the tritium threshold is set, the main model (Fig. 2) was compared with the similarly created model variants that use tritium cut-offs determined by dividing the local tritium concentration in precipitation by 3 or 4 (Supplementary Fig. 7). Although the model performance stayed approximately constant with each incremental increase in the divisor from 2 to 3 to 4 (increasing the prevalence of high-tritium cases; Supplementary Table 2) and the three models appear quite similar, it must be taken into account that they were created using differently balanced binary datasets. That is, by setting the threshold for high tritium to successively lower levels, the proportion of high 3H cases increased from 0.40 for dividing by two (main model result in Fig. 2) to 0.50 for dividing by three and to 0.57 for dividing by four. To evenly compare these three models with each other, their binary results are plotted in Supplementary Fig. 10 using a cut-off corresponding to the model probability at which sensitivity equals specificity, where the model is equally effective at predicting low and high values. Viewed another way, the high predictions represent younger, more-vulnerable groundwater, and low predictions represent older, less-vulnerable groundwater. Even after standardizing the models in this manner, the choice of threshold within the considered range ultimately appears to have little bearing on the resulting aquifer vulnerability maps. Hence, there is only a slight increase in high-vulnerability areas as smaller tritium concentrations are considered.
Sub-regional aquifer vulnerability maps
Unsurprisingly, the areas of higher confidence or AOA in the model are found where more groundwater tritium data exist, particularly in and around Senegal, Burkina Faso and Lake Chad. The sub-regional models created with only the data from each of these areas (Supplementary Fig. 8) were tuned specifically to the conditions present in each area. In general, this should lead to an improved result for each sub-region as the model need not account for potentially differing relationships between the predictor and target variables that may exist in other areas. However, if there are insufficient data to properly characterize the different environments within a given region, data from other areas may be able to help fill these gaps if the conditions elsewhere happen to be similar.
In the case of the three sub-regions modelled separately here, there is no clear change in the extent of the AOA of each sub-region between the main model result (Fig. 2) and the independent models (Supplementary Fig. 8). Nevertheless, each of the sub-regional models appears to be more detailed and exhibit greater variability in the modelled probabilities and their distribution.
Comparison with nitrate
Nitrate is often used as an indicator of aquifer vulnerability as it is abundant in fertilizer and animal manure and gives a strong indication of anthropogenic influence. In addition to higher concentrations of nitrate being associated with some health issues, it may also suggest the presence of hazardous organic compounds found in pesticides and herbicides applied in conjunction with fertilizer. As such, the very strong Kendall rank correlation of 0.90 between nitrate and high tritium concentrations (Fig. 3f) confirms the efficacy of using tritium as an indicator of aquifer vulnerability. However, tritium offers the added advantages of being an integral part of the water molecule (and therefore the water cycle) and decaying at a rate that is compatible with the timescales of concern for aquifer vulnerability. It can therefore indicate the vulnerability of areas that have not yet been exploited or contaminated. Furthermore, nitrate has geogenic as well as anthropogenic sources, which can complicate an assessment of anthropogenic influence and aquifer vulnerability. Conversely, the background concentrations of tritium in the atmosphere are now close to natural levels (following the atmospheric thermonuclear weapons tests from mainly the 1950s and 1960s), which makes it an unambiguous indicator of recent groundwater recharge and therefore a useful means for assessing vulnerability.
Implications of aquifer vulnerability
To better understand the implications of the main aquifer vulnerability map (Fig. 2), its binary version (Supplementary Fig. 10a) was combined with population density and cropland (Fig. 4). Specifically, the areas of high aquifer vulnerability (high tritium) in Supplementary Fig. 10a were selected with the estimated population density in 2020 based on a ‘middle-of-the-road’ shared socioeconomic pathway48 as well as croplands as indicated by a global land-use model49.
This analysis shows that the activities associated with high-density populations in, for example, southern Mali, Burkina Faso and northwest Nigeria pose a threat to underlying unconfined aquifers (Fig. 4). Likewise, extensive agricultural areas above vulnerable aquifers are found in northeast Ghana, northern Benin, Nigeria and northern Cameroon and the area immediately surrounding Lake Chad (Fig. 4b). Included in both maps, but obscured somewhat from view, are the areas of modelled high aquifer vulnerability that are considered less reliable (outside the AOA). However, should the model prove to be accurate in these areas, the aquifers of many more areas in the region could also be vulnerable to pollution, particularly in Nigeria. To better characterize this risk, further groundwater assessments in these areas (ideally including tritium) would be necessary.
Discussion
The tritium prediction map (Fig. 2) may be used to estimate the vulnerability of aquifers to surface pollution. That is, the presence of a higher tritium concentration indicates recent aquifer recharge and thus surface water that infiltrates more rapidly into the subsurface. Hence, contaminants such as organic or other chemicals used in industrial applications, untreated sewage or fertilizers (nutrients) along with pesticides and herbicides applied in agricultural activities can seep more easily from the surface to the aquifers (understanding that different liquids with different physical properties may travel at different rates in the subsurface). Furthermore, the presence of recent recharge means that changes to the rate or timing of recharge could occur in the face of climate change, which may alter evapotranspiration patterns and bring more or less precipitation at different intensities and at different times of the year, all of which may affect aquifer recharge. Aside from highlighting where aquifers can be easily contaminated by human activities, the aquifer vulnerability map shows locations of relatively recent recharge, although not the rate of recharge50,51. As such, the sustainability of groundwater resources, the balance between extraction and recharge, must also be considered in deciding to what degree a groundwater resource should be exploited.
Limitations and assumptions
As with any modelling, there are various sources of uncertainty, such as the accuracy of the predictor and target variables. In this case, a loss of fidelity in the spatially continuous predictors may also have ensued as a consequence of approximations and estimations in fitting data to a given spatial resolution. This may result in inaccuracies of the values of the predictor variables at given geographical coordinates. Likewise, a lower density of points of measured groundwater tritium generally leads to a less constrained, more uncertain model in those areas. Although the AOA calculation helps delineate where a given model can be reasonably applied, a lack of sampling points may nevertheless result in considerable target variance (tritium) in the study area not being captured by the model. The modelling of groundwater tritium across the Sahel has also involved certain assumptions and simplifications due to the lack of reliable depth information across the study area. This has necessitated the use of surface parameters, which nevertheless have proved to be effective proxies in constraining the distribution of tritium.
The groundwater samples may represent a mixture of groundwater from aquifers at different depths and/or a combination of different ages. One aspect of the latter is that some nuclear bomb-era tritium may be incorporated in a given sample and obscure the relationship with respect to modern background tritium concentrations. For this reason, groundwater samples with tritium concentrations clearly exceeding that in present-day precipitation, that is, by 30%, were not used (Methods). Nevertheless, due to measurement errors or the timing and magnitude of infiltration events, it cannot be ruled out that some samples may contain a component of bomb-era water (from the 1950s and 1960s). However, the presence of a considerable component of ~60-year-old groundwater (as opposed to much younger groundwater) may be relevant for water management that aims to sustainably protect aquifers and avoid overexploitation. Indeed, due to the relatively short half-life of tritium, there may soon be no more discernible signal of bomb-era tritium, which would increase the possible temporal resolution of similar tritium-based studies27. In any case, the presence of any tritium signifies recent as opposed to fossil groundwater.
Outlook
The modelling presented here showcases a practical application of utilizing groundwater tritium measurements with machine learning to create aquifer vulnerability maps with a high spatial resolution (between 250 m and 1,000 m), which can be used in conjunction with other information, such as aquifer storage and groundwater recharge rates, in safeguarding groundwater resources. The study also highlights the utility and need for more widespread sampling and analyses of tritium in precipitation and groundwater to help better characterize the water cycle at this interface. For example, the area of applicability of the models indicates where model confidence is higher but also highlights where additional tritium groundwater measurements are needed to more accurately characterize the aquifer vulnerability of the entire region. Furthermore, as more time passes and the anthropogenic signal vanishes towards natural background levels, tritium analyses will make it increasingly feasible to determine more-precise groundwater ages, which could then possibly be used to infer recharge rates, given an appropriate groundwater flow model.
Although the RF algorithm was used here, future studies could also consider other machine-learning methods for possible improvements in implementation or performance. Model performance could also be improved through more-detailed sample-depth information or if other predictor variables, such as three-dimensional hydrogeological properties, would become available.
This study of aquifer vulnerability of the western Sahel of Africa identifies areas of relatively recent groundwater recharge and areas with much older groundwater, in particular for Senegal, Burkina Faso, Lake Chad and their immediately surrounding areas. While this indicates the locations of groundwater bodies that are more vulnerable to surface contamination, the map presented also identifies areas that may receive regular recharge, with associated implications for sustainable extraction. Such information is important for local water resource managers as well as international development agencies in planning for the wise use and long-term protection of groundwater resources, particularly in light of rapid population growth and an increasingly variable climate.
Methods
Groundwater samples
The tritium data used in this study come from a dataset of groundwater samples taken between 2010 and 2016 throughout the western Sahel, ranging in latitude from approximately 6° N to 21° N (Fig. 1). These tritium samples were taken mainly in rural areas (avoiding anthropogenic influences) and stem from an International Atomic Energy Agency (IAEA) technical cooperation project designated by the code RAF701142, whereby the data were validated by the authors of each of the project’s five sub-regional reports and have a mean 3H concentration of 1.7 ± 1.4 TU (Supplementary Table 4). Most of the analyses used electrolytic enrichment and liquid scintillation counting and were carried out in laboratories in Morocco and Tunisia that have successfully participated in various IAEA 3H proficiency tests. The detection limit was generally 0.5 TU, and the mean reported analytical error was 0.25 TU.
In addition to tritium, 12 other physicochemical parameters were measured in the respective groundwater samples (descriptive statistics in Supplementary Table 4). Derived from the ratios of the stable water isotopes, deuterium excess, or D-excess, is indicative of the source regions of water vapour and is calculated from the relative surpluses of the heavier deuterium (2H or D) and 18O isotopes present in water:
D-excess is sensitive to evaporation from the ocean as well as subsequent cycles of precipitation and re-evaporation on land and in the atmosphere. Although neither D-excess nor any of the other physicochemical parameters could ultimately be used to create a vulnerability map due to being point data and not spatially continuous, they are still useful in constraining the hydrological interpretations of the tritium values and the modelling results.
Preparation of data for modelling
Before being able to utilize the groundwater tritium measurements, it was necessary to determine the background concentration of tritium in precipitation, which varies by location, in particular by latitude but also by continentality and other factors25. For this purpose, the recent global isoscape model of Terzer-Wassmuth et al.25 was used. This model provides the distribution of tritium in precipitation in the modern, post-bomb era and, as such, represents the natural background of tritium. It utilizes the most recent precipitation tritium data available and accounts for many factors, including latitude, longitude, elevation, land-mass fraction, distance to coast and precipitation amounts. Its root mean square error in the Sahel region is 0.24 TU, which is nearly identical to the average reported analytical groundwater 3H error (0.25 TU). The values of tritium in precipitation were abstracted from the global model on the basis of the geographical coordinates of the groundwater samples.
As the concentration of tritium in some of the groundwater samples considerably exceeds that of tritium in modern precipitation, these measurements could obscure the interpretation of the modelling results. That is, for 3H concentrations in groundwater to greatly exceed that in modern precipitation, a considerable amount of the groundwater must have infiltrated in the 1950s and 1960s, when atmospheric concentrations of tritium were exceedingly high, or there may be a local source of tritium, such as from a nuclear facility or landfill site33. To correct for this while also allowing for measurement errors, groundwater samples with 3H concentrations 130% or greater than that in precipitation were removed from the dataset before modelling. This affected 50 (4%) of the 1,234 samples of the dataset and reduced the number of data points available for modelling to 1,184.
The local value of 3H in modern precipitation was divided by two to determine the cut-off between younger (more vulnerable) and older (less vulnerable) groundwater. That is, if the groundwater tritium exceeds the threshold, the sample is classified as high/younger and more vulnerable. Although where to place this cut-off is essentially arbitrary, one-half of the concentration of local precipitation implies a groundwater age that would be of interest from a groundwater management perspective, that is, at least the half-life of 3H or 12.3 years but potentially considerably older depending on the exact recharge history and mixing at a given site. Furthermore, this threshold was practical from a modelling perspective in that 40% of the groundwater tritium measurements were classified as high. For comparison, the effect of modelling with different tritium thresholds was tested by dividing the modern local concentrations of 3H in precipitation by 3 or 4 instead of 2 (Discussion). Note that to test for the depth dependence of 3H concentration in groundwater, depth was included in a separate model as an in situ predictor variable (point data).
Predictors of aquifer vulnerability
Thirty-four potentially relevant spatially continuous environmental parameters were assembled that may relate to the presence of high tritium concentrations in groundwater and therefore the occurrence of relatively recent recharge (Supplementary Table 5). These parameters stem for the most part from the categories of climate, geology/soil, hydrology or topography and have a spatial resolution between 7.5’ and 30.0’, which corresponds to approximately 250 m–1,000 m at the Equator. It was necessary that each predictor variable be spatially continuous over the model domain to later enable the creation of aquifer vulnerability maps.
Comparison of tritium with other parameters
To obtain a first-order estimate of the factors relating to the presence of tritium in groundwater, Kendall rank correlations were calculated between the proportion of high tritium measurements and the spatially continuous parameters as well as the other physicochemical parameters measured in situ. This was accomplished by first binning each independent parameter using the Rice rule to set the number of bins, each of which contains the same number of data points:
The proportion of high tritium measurements was then determined for each bin and the Kendall rank correlation calculated (Supplementary Figs. 1 and 2 and Supplementary Table 1).
Aquifer vulnerability modelling
The RF machine-learning algorithm52 was used for classification to generate a predictive model of the presence of high tritium in groundwater. Random forest was chosen because of its efficient and effective use in other groundwater quality studies35,37,53,54 and was implemented in the R programming language55 using the ranger package56.
The RF algorithm creates an ensemble (or forest) of decision trees grown under varying conditions to avoid overfitting and create a more robust and effective model. Each tree in the forest is grown with a different random subset of the input dataset, which is produced by sampling with replacement. In addition, a randomly selected subset of the predictors is made available at each node of a tree. The final result is produced by averaging the results of all the generated trees. A separate dataset to be used only for testing was not created due to the already limited size of the full dataset (n = 1,184), which is distributed across the vast Sahel region. However, in principle, a dataset to be used only for testing is advantageous if a large number of data are available57.
To identify the most relevant and effective predictors of tritium content in groundwater, recursive feature elimination was employed58, whereby the 20% least important variables were removed in successive RF iterations, and the final selection of variables corresponds to the forest with the least number of variables and an error rate within one standard deviation of the minimum error rate found among all RF iterations.
Following the selection of the predictor variables, two further modelling parameters were tuned with cross validation using the caret package59. These were the number of randomly selected predictor variables made available at each node (‘mtry’) and the minimum number of data points required in a node (minimum node size), both of which have implications for potential overfitting. The range of possible values tested for mtry was 1 to the total number of predictors and 1 to 5 for the minimum node size (typical default value 1 for RF classification).
Before creating a final RF model, data were duplicated in the training dataset if the classification (high or low tritium) would remain the same when either adding or subtracting the measurement error reported with the 3H concentration. This was done to weight those measurements with greater confidence of being correctly classified. If no error was reported in the dataset, which was the case for 26% of the measurements, a realistic error was estimated by multiplying the measurement by the median reported error-to-measurement ratio, which was 0.15. Taking this correction into account, 90% of the measurements could be duplicated when used for training, that is, they were deemed reliably classified. No rows were duplicated in the validation dataset. Following this, each RF was then grown with 1,000 trees.
To test the sensitivity of the threshold for high tritium on the model results, separate RF models were created using the thresholds of the estimated tritium in local precipitation divided by three as well as by four. As such, the range of thresholds investigated spans from one to two tritium half-lives, that is, the original threshold of the 3H in local precipitation divided by either two (one half-life) or four (two half-lives).
Model evaluation
The performance of the model was assessed through the cross validation of 100 RF iterations, whereby the model in each iteration was trained using a randomly selected 80% of the data (training dataset) and evaluated on the remaining 20% (validation dataset). The statistics calculated and collected for each iteration were sensitivity, specificity, balanced accuracy, kappa and the area under the receiver operator characteristic curve (AUC). The statistics from each RF iteration were then averaged to estimate the predictive capability of the model.
The influence of each predictor variable on the model was assessed by considering the importance of each variable to the model as well as creating a PDP. Importance was quantified in terms of the mean decrease in the unbiased Gini impurity60, or Gini index, which indicates how effectively a variable performs classification and divides the dataset into the two tritium classes (high and low). A variable is considered more important (better performing) if the decrease in Gini index is greater. To better understand the influence of each variable, PDPs were created that show the model’s response to different values of a variable when all of the other variables are held constant at their average values. Both the importance and PDPs were calculated once using the final model, which was created utilizing the complete dataset.
Aquifer vulnerability map
The final RF model was applied to the corresponding predictor variables to create a prediction map of the occurrence of high tritium concentrations in groundwater. The differing spatial resolution of the predictor datasets was rectified by resampling the lower-resolution rasters (10’ and 30’) to match that of the higher-resolution rasters (7.5’). That is, all information from the predictors was maintained. As such, the aquifer vulnerability maps represent this mix of resolutions although these have a nominal resolution of 7.5’.
The reliability of the prediction map was estimated by evaluating the similarity of the values of the variables in the predicted areas to those associated with the groundwater tritium concentrations used to create the model. This was carried out with the CAST package61, which calculates a dissimilarity index (DI) for each pixel in the prediction area, which is then used to define an AOA for the model. The DI uses the distance in predictor space between two points (dissimilarity) and is weighted by the variable importance. It takes the dissimilarity of the closest training data point, standardized by the average dissimilarity of all training points, and can range in value from 0 to infinity. The AOA was defined using the 95% quantile of the DI in the training data.
Focused sub-regional models
Three separate additional models were created of the sub-regions surrounding Senegal, Burkina Faso and Lake Chad, all of which contain a high concentration of data points due to targeted activity within these areas in the RAF7011 IAEA project42. Setting a smaller geographical domain offers the possibility of producing a more accurate model that focuses only on the relationships between the predictor and target variables (and processes) in the area. The geographical boundaries used for these sub-regions and a summary of the tritium data contained within are provided in Supplementary Table 3. The same procedure of variable selection by means of recursive feature elimination starting with 34 potential predictor variables was conducted as for the Sahel-wide model.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The groundwater tritium data used in this study were taken from the publicly available source cited in the manuscript. GeoTIFF rasters of the Sahel aquifer vulnerability maps are available at the ERIC/open database62 (https://doi.org/10.25678/000CGY) and can be viewed on the Groundwater Assessment Platform (www.gapmaps.org).
Code availability
The code used in analysis and modelling in this study was developed with the R programming language and is available at ERIC/open database62 (https://doi.org/10.25678/000CGY).
References
Braune, E. & Xu, Y. The role of ground water in sub-Saharan Africa. Groundwater 48, 229–238 (2010).
Cuthbert, M. O. et al. Observed controls on resilience of groundwater to climate variability in sub-Saharan Africa. Nature 572, 230–234 (2019).
Pavelic, P., Giordano, M., Keraita, B. N., Ramesh, V. & Rao, T. Groundwater Availability and Use in Sub-Saharan Africa: A Review of 15 Countries (International Water Management Institute, 2012).
Nkiaka, E., Bryant, R. G., Okumah, M. & Gomo, F. F. Water security in sub‐Saharan Africa: understanding the status of sustainable development goal 6. Wiley Interdiscip. Rev. Water 8, e1552 (2021).
Dragoni, W. & Sukhija, B. S. Climate change and groundwater: a short review. Geol. Soc. Spec. Publ. 288, 1–12 (2008).
Population Growth (Annual %) (World Bank, 2022); https://data.worldbank.org/indicator/SP.POP.GROW
Taghavi, N., Niven, R. K., Paull, D. J. & Kramer, M. Groundwater vulnerability assessment: a review including new statistical and hybrid methods. Sci. Total Environ. 822, 153486 (2022).
Best, K. B., Miro, M. E., Kirpes, R. M., Kaynar, N. & Chesler, A. N. Data-driven decision support tools for assessing the vulnerability of community water systems to groundwater contamination in Los Angeles County. Environ. Sci. Policy 124, 393–400 (2021).
Jasechko, S. et al. Global aquifers dominated by fossil groundwaters but wells vulnerable to modern contamination. Nat. Geosci. 10, 425–429 (2017).
Sorensen, J. et al. Emerging contaminants in urban groundwater sources in Africa. Water Res. 72, 51–63 (2015).
Gorelick, S. M. & Zheng, C. Global change and the groundwater management challenge. Water Resour. Res. 51, 3031–3051 (2015).
van Rooyen, J., Watson, A. & Miller, J. Combining quantity and quality controls to determine groundwater vulnerability to depletion and deterioration throughout South Africa. Environ. Earth Sci. 79, 255 (2020).
MacDonald, A. M. et al. Mapping groundwater recharge in Africa from ground observations and implications for water security. Environ. Res. Lett. 16, 034012 (2021).
Aeschbach-Hertig, W. & Gleeson, T. Regional strategies for the accelerating global problem of groundwater depletion. Nat. Geosci. 5, 853–861 (2012).
Stempvoort, D. V., Ewert, L. & Wassenaar, L. Aquifer vulnerability index: a GIS-compatible method for groundwater vulnerability mapping. Can. Water Resour. J. 18, 25–37 (1993).
Shirazi, S. M., Imran, H. & Akib, S. GIS-based DRASTIC method for groundwater vulnerability assessment: a review. J. Risk Res. 15, 991–1011 (2012).
Mateo-Sagasta, J., Zadeh, S. M. & Turral, H. More People, More Food, Worse Water?: A Global Review of Water Pollution from Agriculture (FAO, 2018).
Lucas, L. L. & Unterweger, M. P. Comprehensive review and critical evaluation of the half-life of tritium. J. Res. Natl Inst. Stand. Technol. 105, 541–549 (2000).
Michel, R. L. Tritium in the Hydrologic Cycle (Springer, 2005).
Andrews, J. & Kay, R. Natural production of tritium in permeable rocks. Nature 298, 361–363 (1982).
Phillips, J. & Easterly, C. Sources of Tritium (Oak Ridge National Lab., 1980).
Harms, P. A., Visser, A., Moran, J. E. & Esser, B. K. Distribution of tritium in precipitation and surface water in California. J. Hydrol. 534, 63–72 (2016).
Mahlangu, S., Lorentz, S., Diamond, R. & Dippenaar, M. Surface water–groundwater interaction using tritium and stable water isotopes: a case study of Middelburg, South Africa. J. Afr. Earth Sci. 171, 103886 (2020).
Michel, R. L., Jurgens, B. C. & Young, M. B. Tritium Deposition in Precipitation in the United States, 1953–2012 Report No. 2018–5086 (USGS, 2018).
Terzer-Wassmuth, S., Araguás-Araguás, L. J., Copia, L. & Wassenaar, L. I. High spatial resolution prediction of tritium (3H) in contemporary global precipitation. Sci. Rep. 12, 10271 (2022).
Rozanski, K., Gonfiantini, R. & Araguas-Araguas, L. Tritium in the global atmosphere: distribution patterns and recent trends. J. Phys. G 17, S523 (1991).
Eastoe, C., Watts, C., Ploughe, M. & Wright, W. Future use of tritium in mapping pre‐bomb groundwater volumes. Groundwater 50, 87–93 (2012).
Morgenstern, U. & Taylor, C. B. Ultra low-level tritium measurement using electrolytic enrichment and LSC. Isotopes Environ. Health Stud. 45, 96–117 (2009).
Morgenstern, U., Stewart, M. K. & Stenger, R. Dating of streamwater using tritium in a post nuclear bomb pulse world: continuous variation of mean transit time with streamflow. Hydrol. Earth Syst. Sci. 14, 2289–2301 (2010).
László, E., Palcsu, L. & Leelőssy, Á. Estimation of the solar-induced natural variability of the tritium concentration of precipitation in the Northern and Southern Hemisphere. Atmos. Environ. 233, 117605 (2020).
van Rooyen, J., Palcsu, L., Visser, A., Vennemann, T. & Miller, J. Spatial and temporal variability of tritium in precipitation within South Africa and it’s bearing on hydrological studies. J. Environ. Radioact. 226, 106354 (2021).
Lindsey, B. D., Jurgens, B. C. & Belitz, K. Tritium as an Indicator of Modern, Mixed, and Premodern Groundwater Age (USGS, 2019).
Tazioli, A., Fronzi, D. & Mammoliti, E. Tritium as a tracer of leachate contamination in groundwater: a brief review of tritium anomalies method. Hydrology 9, 75 (2022).
van Rooyen, J. D., Watson, A., Palcsu, L. & Miller, J. Constraining the spatial distribution of tritium in groundwater across South Africa. Water Resour. Res. 57, e2020WR028985 (2021).
Podgorski, J. & Berg, M. Global threat of arsenic in groundwater. Science 368, 845–850 (2020).
Podgorski, J. & Berg, M. Global analysis and prediction of fluoride in groundwater. Nat. Commun. 13, 4232 (2022).
Podgorski, J., Araya, D. & Berg, M. Geogenic manganese and iron in groundwater of Southeast Asia and Bangladesh—machine learning spatial prediction modeling and comparison with arsenic. Sci. Total Environ. 833, 155131 (2022).
Knoll, L., Breuer, L. & Bach, M. Nation-wide estimation of groundwater redox conditions and nitrate concentrations through machine learning. Environ. Res. Lett. 15, 064004 (2020).
Gurdak, J. J. & Qi, S. L. Vulnerability of recently recharged groundwater in principle aquifers of the United States to nitrate contamination. Environ. Sci. Technol. 46, 6004–6012 (2012).
Ransom, K. M., Nolan, B. T., Stackelberg, P., Belitz, K. & Fram, M. S. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States. Sci. Total Environ. 807, 151065 (2022).
Ouedraogo, I., Defourny, P. & Vanclooster, M. Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeol. J. 27, 1081–1098 (2019).
The Transboundary Aquifers Reports of the Sahel Region: IAEA RAF7011 Project (IAEA, 2017).
ArcGIS Pro 2.9.0 (Environmental Systems Research Institute, 2023).
QGIS.org. 3.16.9-Hannover. QGIS Geographic Information System (QGIS Association, 2023); http://www.qgis.org
Gleeson, T., Moosdorf, N., Hartmann, J. & Van Beek, L. A glimpse beneath Earth’s surface: GLobal HYdrogeology MaPS (GLHYMPS) of permeability and porosity. Geophys. Res. Lett. 41, 3891–3898 (2014).
Fan, Y., Li, H. & Miguez-Macho, G. Global patterns of groundwater table depth. Science 339, 940–943 (2013).
Cropper, S. et al. Comparing deuterium excess to large-scale precipitation recycling models in the tropics. NPJ Clim. Atmos. Sci. 4, 60 (2021).
Gao, J. Global Population Projection Grids Based on Shared Socioeconomic Pathways (SSPs), Downscaled 1-km Grids, 2010–2100 (SEDAC, 2019).
Friedl, M. A. et al. MODIS Collection 5 global land cover: algorithm refinements and characterization of new datasets. Remote Sens. Environ. 114, 168–182 (2010).
West, C., Rosolem, R., MacDonald, A. M., Cuthbert, M. O. & Wagener, T. Understanding process controls on groundwater recharge variability across Africa through recharge landscapes. J. Hydrol. 612, 127967 (2022).
West, C. et al. Ground truthing global-scale model estimates of groundwater recharge across Africa. Sci. Total Environ. 858, 159765 (2023).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Podgorski, J., Wu, R., Chakravorty, B. & Polya, D. A. Groundwater arsenic distribution in India by machine learning geospatial modeling. Int. J. Environ. Res. Public Health 17, 7119 (2020).
Podgorski, J. E., Labhasetwar, P., Saha, D. & Berg, M. Prediction modeling and mapping of groundwater fluoride contamination throughout India. Environ. Sci. Technol. 52, 9889–9898 (2018).
R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014); https://www.R-project.org/
Wright, M. N. & Ziegler, A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77, 1–17 (2017).
Kuhn, M. & Johnson, K. Applied Predictive Modeling Vol. 26 (Springer, 2013).
Diaz-Uriarte, R. & de Andrés, S. A. Variable selection from random forests: application to gene expression data. Preprint at https://arxiv.org/abs/q-bio/0503025 (2005).
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. https://doi.org/10.18637/jss.v028.i05 (2008).
Nembrini, S., König, I. R. & Wright, M. N. The revival of the Gini importance? Bioinformatics 34, 3711–3718 (2018).
Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol. Evol. 12, 1620–1633 (2021).
Podgorski, J. et al. Podgorski_et_al_2024. ERIC/open https://doi.org/10.25678/000CGY (2024).
Acknowledgements
We thank S. Läubli for her help in analysing the tritium dataset at an early stage and L. Wassenaar for preliminary discussions. We also thank the authors of the RAF7011 reports, in which the data used in this study were published. This work was funded by the Swiss Federal Office of Energy (SFOE) (AVARIs project, grant number: SI-501914-01; J.P., R.K. and M.B.) with additional support of the Swiss Agency for Development and Cooperation (SDC) (WABES project, grant number: 7F-09963.02.01; J.P. and M.B.).
Funding
Open Access funding provided by Lib4RI – Library for the Research Institutes within the ETH Domain: Eawag, Empa, PSI & WSL.
Author information
Authors and Affiliations
Contributions
J.P., O.K., R.S. and M.B. conceived the project. J.P. conducted the modelling with input from O.K., L.A.-A., J.M. and M.B. L.A.-A., S.T.-W. and R.K. helped with interpreting the tritium data. The manuscript is drafted by J.P. with input from all other authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Sustainability thanks Uwe Morgenstern, Issoufou Ouedraogo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–10 and Tables 1–5.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Podgorski, J., Kracht, O., Araguas-Araguas, L. et al. Groundwater vulnerability to pollution in Africa’s Sahel region. Nat Sustain 7, 558–567 (2024). https://doi.org/10.1038/s41893-024-01319-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41893-024-01319-5
This article is cited by
-
Tracing Nitrate Contamination Sources and Apportionment in North-Western Volta River Basin of Ghana Using a Multi-Isotopic Approach
Water, Air, & Soil Pollution (2024)