Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Assessing the population-wide exposure to lead pollution in Kabwe, Zambia: an econometric estimation based on survey data

## Abstract

This study quantitatively assessed the population-wide lead poisoning conditions in Kabwe, Zambia, a town with severe lead pollution. While existing data have reported concerning blood lead levels (BLLs) of residents in pollution hotspots, the data representing the entire population are lacking. Further, selection bias is a concern. Given the lack of compulsory testing schemes, BLLs have been observed from voluntary participants in blood sampling surveys, but such data can represent higher or lower BLLs than the population average because of factors simultaneously affecting participation and BLLs. To illustrate the lead poisoning conditions of the population, we expanded the focus of our surveys and then econometrically estimated the BLLs of individuals representing the population, including those not participating in blood sampling, using background geographic, demographic, and socioeconomic information. The estimated population mean BLL was 11.9 μg/dL (11.6–12.1, 95% CI), lower than existing data because of our wide focus and correction of selection bias. However, the scale of lead poisoning remained immense and 74.9% of residents had BLLs greater than 5 μg/dL, the standard reference level for lead poisoning. Our estimates provide a deeper understanding of the problem and a foundation for policy intervention designs.

## Introduction

Lead poisoning is one of the most serious and harmful consequences of environmental pollution. High levels of lead intake adversely affect the functioning of the circulatory and nervous systems, which can be fatal in extreme cases, whereas low-level exposure can also reduce cognitive ability and cause developmental disorders1,2,3,4. The adverse health effects can further result in poor school performance, lowered educational attainment and lifetime earning, and behavioural disorders5,6,7,8. While the use of lead in certain products, such as gasoline and paints, has been globally banned or reduced, lead poisoning remains imposing considerable costs to society, particularly in low- and middle-income countries, owing to the continued use of lead in various products, mining and smelting activities, and the lack of remediation for contaminated environments4,9,10.

Despite these alarming reports, representative data of the extent of lead poisoning among the entire population of Kabwe are lacking. This can be attributed to two issues related to data collection. First, most previous surveys have primarily focused on a small sample of data collected from children residing in areas around the mining sites. Although the pollution problem is the most acute in these areas, and children are generally at the highest risk of lead poisoning4,18, the data coverage has been far from representative. Second, while BLL data were obtained from those who voluntarily participated in studies, given the lack of public mechanisms for formal and compulsory blood lead testing, there could be selection bias in the data, a problem widely recognised in the medical, statistical, public health, and economics literature24,25,26,27,28. Data from voluntary (or self-selected) participants in studies generally can fail to reflect the conditions of the population if certain factors simultaneously affect participation decisions and the outcomes of interest. In the context of our study, the residents voluntarily participating in the blood sampling can be those particularly concerned about or suspected to have lead poisoning, and their BLLs can be lower or higher than the population average. Various background characteristics, including both observable (e.g. education, age, employment and living standards) and unobservable ones (e.g. health preferences), can also form their willingness and constraints to participate and directly affect BLLs. As the demographic composition, socioeconomic conditions and pollution levels are diverse in Kabwe, data reflecting these diversities and correcting for potential selection bias are essential.

The purpose of this study is to quantitatively assess the prevalence of lead poisoning among the entire population of Kabwe (administratively, the district of Kabwe). Our methodology to accomplish this purpose is twofold. First, we chose our sample individuals based on random sampling covering the entire Kabwe district. Although we could obtain BLL data from the subset of the chosen individuals who voluntarily participated in blood sampling, this expanded our focus beyond that of previous studies and helped define the target samples representing the population. Quick results of the blood sampling survey are available elsewhere22. Second, we employed econometric models to estimate BLLs for the representative sample individuals. Concurrently with the blood sampling survey, we conducted a household survey to obtain background socioeconomic, demographic, and geographic information for the sample individuals, including those who did not participate in the blood sampling survey. We then used these data to econometrically estimate the equation to determine BLLs and, finally, calculated the BLLs of the entire sample individuals. We paid attention to the potential differences in both observable and unobservable characteristics between the participants and non-participants under two econometric methods: ordinary least squares (OLS) and Heckman’s sample selection model24.

The contributions of our study are twofold. First, this study illustrates the severity and diversity of the lead pollution problem in Kabwe and help policymakers design remedial measures. This study is the first attempt to systematically obtain representative estimates of the lead poisoning conditions of residents in the entire Kabwe district. The estimated mean BLL was 11.9 μg/dL and 74.9% of residents had BLLs above the standard reference level of 5 μg/dL. Data representing the population are important to fully understand the pollution problem in Kabwe. Representative data can also serve as a foundation for a policy intervention design and cost–benefit analysis. Further, our estimates shed light on the extent of risks facing low- and middle-income countries, contributing to studies quantifying the global burden of general and pollution-related diseases29,30,31,32. Despite the substantial impact of lead poisoning on the global disease burden, there remain gaps in the literature, and existing data do not precisely depict the burden of diseases at contaminated sites31.

The second contribution is methodological and refers to the management of selection bias. Health surveys are often subject to selection bias analogous to our data, which can lead to biased conclusions27,28. The methodology adopted not only is consequential for bias mitigation in the current study. Although the details of our specifications would need modifications, our approach is applicable to many other cases where formal and compulsory testing for a disease is lacking.

## Methods

### Data collection and potential selection bias

We conducted two joint surveys from July to September 2017 in Kabwe: the Kabwe Household Socioeconomic Survey (KHSS) 2017 conducted by the Central Statistical Office of Zambia and University of Zambia under the supervision of the authors, and a BLL survey performed by the authors. The surveys were approved by the University of Zambia Research Ethics Committee (UNZAREC; REF. No. 012-04-16). Further approvals were granted by the Ministry of Health through the Zambia National Health Research Ethics Board and the Kabwe District Medical Office. The data were collected in accordance to the Declaration of Helsinki, and the informed consent was obtained from all the study participants including the parents/legal guardian of the minor subjects for participating in the study.

The two surveys were designed consistently and targeted the same sample households selected in the following two-step approach. In the first step, utilising the Zambia’s national census frame which divides the Kabwe district into 384 standard enumeration areas (SEAs), we randomly selected 40 SEAs across the entire district. In the second step, we randomly selected 25 households (and a few replacements) from each sampled SEA. The sampling weights were generated to account for population differences across the SEAs.

The KHSS 2017 conducted interviews with 895 households (4,900 individuals) at houses and collected data on socioeconomic, demographic and geographic information. The response rate was 88.2%, and we could regard the data adjusted by the sampling weights as representative of the entire Kabwe population (for more details of the survey, see the report33).

To obtain BLL data, we conducted a blood sampling survey concurrently with the KHSS 2017. For hygiene and ethical considerations, we selected 13 local clinics to perform the blood sampling, instead of collecting blood at houses. We invited up to four members (two children aged 10 years or younger and their parents or guardians) from each sample household for the blood sampling. We prioritised young children over children older than 10 years old. The invitations were made sequentially. We assigned identical venues and dates for households from the same SEAs. The typical assigned dates had a 3-day window from the day after the invitation. However, we allowed for some flexibility and sampled the blood of those who visited the clinic even after the assigned time window, as long as the clinic was operational for households from other SEAs. Therefore, the window for blood sampling was effectively the number of days from the day after the invitation until the pre-set blood sampling period in each clinic was over, which had a substantial variation across households from 3 days to a month. We revisit this feature of the survey window when setting up our econometric model later. A total of 372 households (41.6%) participated in the blood sampling and, on average, 2.2 members from the participating households provided blood samples.

We performed blood digestion and metal extraction as described by our previous study34 with minor modifications and measured BLLs using an Inductively Coupled Plasma-Mass Spectrometer (ICP-MS). In addition, we also measured BLLs with a portable analyser, LeadCare II, to obtain quick results22. However, we in this study focus on the ICP-MS data, considering their general accuracy. See the Supplementary Material Section S1 for details on the methods used to measure BLLs and the difference in the data between the two analysers.

Regardless of the accuracy of the techniques, however, we further need to account for the risk of selection bias in the BLL data. In the absence of formal and compulsory testing mechanisms, we relied on individuals’ voluntary (self-selected) visits to the clinics. However, the participants in blood sampling could have traits leading to higher or lower BLLs than the population. Such traits can include education, gender, age and living standards. The survey design did not prioritise children aged 11 years or older, and this could also contribute to the deviation of characteristics, although a small number of such children attended clinics. Moreover, certain unobservable characteristics affecting BLLs can further differ between the participants and non-participants. For example, those with greater preferences for health possibly had low BLLs but tended to participate in the blood sampling surveys, whereas those with a high innate physiological capacity for lead excretion possibly tended not to participate because they had low BLLs and did not perceive symptoms of lead poisoning. These issues can lead to selection bias, and the raw data observed from the voluntary participants can fail to illustrate the lead poisoning conditions of the population.

### BLL estimation approach

To correct for potential selection bias, we first estimated the equations to explain BLLs of children aged 0–10 years and adults aged 19 years or above. Then, using the estimated equations, we calculated BLLs for all individuals, including children aged 11–18 years and those in the other age groups who did not participate in the blood sampling.

BLLs generally depend on the ambient pollution level, the opportunities of exposure to pollution, the physiological capacity of lead absorption and excretion, and the knowledge and technologies used to prevent lead poisoning. We controlled for ambient pollution levels by including the distance, direction, and altitude of household location—the first two variables are with respect to the mine waste dumping site (Black Mountain). The remaining factors were measured by age and various other individual and household characteristics denoted by $${{\varvec{X}}}_{i}$$. Data for these variables are available regardless of participation in blood sampling. We assumed the following equation for BLL:

\begin{aligned} \log BLL_{i} & = \beta_{dis} \log distance_{i} + \beta_{dir1} direction_{i} + \beta_{dir2} direction_{i}^{2} \\ & \quad + \beta_{alt} altitude_{i} + f\left( {age_{i} } \right) + {\varvec{X}}_{i} \user2{\gamma^{\prime}} + \varepsilon_{i}. \hfill \\ \end{aligned}
(1)

The logarithmic form for BLL adjusts its distribution to approximately normal—BLL is bounded from below and has a skewed distribution—and allows the factors on the right-hand side to have proportional effects rather than level effects. $${\varepsilon }_{i}$$ is the independent and identically distributed error term that captures noise, such as casual fluctuations and measurement errors in BLLs, and the effects of unobservable factors. While we presented a single equation above, we assumed different equations for children aged 0–10 years and adults aged 19 years or above.

Below, we discuss our specification in detail.

### Geographic factors

Existing studies have examined the relationship between the geographic location and ambient pollution level12,13,14. Since lead is transported from the mine waste dumping site through the flow of wind and water, the distance from the site is negatively correlated with ambient lead levels. The soil lead contamination spreads to the western side of the site, particularly towards the west-northwest (WNW), which corresponds to the direction of the prevailing local wind. The contamination also slightly extends to the low-elevation south-eastern side, reflecting pollution transported by water. The northern and southern sides are the least contaminated.

We defined $$distanc{e}_{i}$$ as the distance between the mine waste dumping site and the location of $$i$$’s household, with $${\beta }_{dis}<0$$ expected. Also, we assumed that the WNW is the most contaminated and, accordingly, we defined $$directio{n}_{i}$$ as the radian of the acute angle passing through WNW, the mine waste dumping site, and the location of $$i$$’s household. That is, the household location is WNW at $$directio{n}_{i}=0$$, either north-northeast or south-southwest at $$\pi /2$$, and east-southeast (ESE) at $$\pi$$. We employed a quadratic specification in Eq. (1), which allows BLLs to have two peaks at WNW and ESE if $${\beta }_{dir1}<0$$, $${\beta }_{dir2}>0$$ and $$-{\beta }_{dir1}/\left(2{\beta }_{dir2}\right)<\pi$$. We statistically assessed the appropriateness of the specification for direction in Supplementary Material Section S2. We also used altitude in metres, $$altitud{e}_{i}$$, considering that elevated areas can be less exposed to dust and water flows, although the general tendency of land elevation can be absorbed by the direction variables.

### Age and other covariates

For children, we assumed a non-linear relationship between their ages and BLLs and defined the following functional form:

$$f\left( {age_{i} } \right) = \left[ {\phi_{0} + \phi_{1} mage_{i} + \phi_{2} mage_{i}^{2} } \right] \times I\left( {age_{i} < 2} \right) + \phi_{3} age_{i} \times I\left( {age_{i} \ge 2} \right).$$
(2)

$$I\left( \cdot \right)$$ is an indicator function that takes the value of 1 if the argument condition is satisfied, and $$mag{e}_{i}$$ denotes age in months. The functional form reflects the findings in the literature. Young children are generally at a high risk of lead poisoning. Playing outside and age-appropriate hand-to-mouth behaviours expose them to lead, and their gastrointestinal absorption of lead is high4. Foetuses and infants born to exposed mothers absorb lead in utero and through breastfeeding35. Consequently, BLLs often reach a peak at or before the age of 24 months and then decrease as children grow older, reflecting their physical and behavioural growth1,36. Thus, we employed a specification that allows an inverted U-shaped relationship between the logarithmic BLL and age up to 23 months, but assume a linearly decreasing relationship between the two factors for children aged 2 years or above.

For adults, the physiological foundation of the BLL-age relationship is not clear, but age-related changes in metabolism and lifestyle can affect BLLs. We simply assumed a log linear relationship between BLL and age for adults.

In addition, we used the following individual and household characteristics, denoted as $${{\varvec{X}}}_{i}$$, for children: a dummy variable for female; the mothers’ education level (grades), which reflects their general, health-related and lead-related knowledge; a dummy variable for children whose mothers were absent (the mothers’ education level was set at zero for such children); a dummy variable for female-headed households; household size; dependency ratio (the proportion of household members aged 0–15 years and 65 years or above); and the log of per capita household expenditure, which measures living standards. We also used dummy variables for household location: urban areas, small-scale farming areas, large-scale farming areas, and the Makululu compound—an area of informal settlement where public services are poorly delivered. We set urban area as the base category.

For adults, we continued to use the dummy variables for female and household location, household size and dependency ratio but dropped the variables related to mothers and household heads. The per capita household expenditure was not used, either, because it is not exogenous for adults. Instead, we used their own education level, which reflects living conditions to certain extent as well as knowledge levels. We also used a dummy variable for marital status, which takes the value of one for either married or co-habiting individuals, and the duration of residence in Kabwe (in years) to account for the effects of long-term lead exposure.

### Econometric methods to estimate BLL equation

We considered two methods to estimate Eq. (1). The first one is OLS, which directly estimates Eq. (1) from the data of the participants in the blood sampling survey. If the bias in BLLs are attributable to the difference in observable factors between the participants and non-participants, then the OLS estimate of Eq. (1) is unbiased and can be used to obtain estimates representing the population. However, as previously mentioned, unobservable characteristics can also affect both BLLs and participation decisions. This can disrupt the error term distribution and bias the OLS estimate of Eq. (1).

To account for this risk, we also adopted Heckman’s sample selection model24. This model corrects for the bias in unobservable factors by simultaneously estimating the probability of participation (selection equation) for the entire sample, including non-participants. Specifically, we considered the following selection equation:

\begin{aligned} \Pr \left( {i\;participates} \right) & =\Psi \{ \delta_{dis} \log distance_{i} + \delta_{dir1} direction_{i} + \delta_{dir2} direction_{i}^{2} \\ & \quad + \delta_{alt} altitude_{i} + g\left( {age_{i} } \right) + {\varvec{X}}_{i} \user2{\xi^{\prime}} + \zeta window_{i} \} , \\ \end{aligned}
(3)

where $$\Psi$$ is the normal distribution function with the probability density function of $$\psi$$, $${\varvec{X}}_{i}$$ is the same as in Eq. (1), and $$g\left( {age_{i} } \right)$$ has the functional forms identical to $$f\left( {age_{i} } \right)$$. The bias in Eq. (1) can be fixed by estimating Eq. (1) with the inverse Mills ratio, $$\psi /\Psi$$.

In the sample selection model, the use of an exclusion restriction variable, which affects the probability of participation but not BLL, is preferable. We used the number of days of the blood sampling window denoted by $$windo{w}_{i}$$ as an exclusion restriction. As described above, the blood sampling window was effectively the number of days that the assigned clinic remained operational for blood sampling after the day following the invitation. Other factors being equal, households that received early invitations and had longer time windows would more easily manage to attend clinics and would have higher probabilities of participation. The exogenous nature of the blood sampling window renders it irrelevant for BLLs.

### Estimation of the representative BLLs

After obtaining the BLL equations, we estimated the BLLs of the representative sample individuals by inputting their characteristics on the right-hand side of the equations. We applied the survey’s sampling weights when aggregating the estimated BLLs.

To estimate the BLLs of adolescents aged 11–18 years, who were basically not covered in our BLL survey and thus not used in the BLL equation estimations, we used the equation for children aged 0–10 years, assuming that age-BLL trend, which we expected to be negative, would hold up to the age of 18 years.

Next, we calculated the number of the residents with BLLs above 5 μg/dL by interacting the estimated proportion of those with such BLLs and the total population. Considering the population growth, we used the population estimates of our own33 and the Central Statistical Office of Zambia37, both as of 2017, instead of 200,000 as of 2010.

Further, we present two graphical results. The first one is an in-depth examination of the mean BLLs across age groups. In the second one, we simulated the geographic variation of the mean BLLs. We divided the entire Kabwe district into 1 km × 1 km grids, and estimated the mean BLL in each grid cell. Distance and direction were measured for each cell and other independent variables were measured by the means in the ward—official inner-district division—to which the cell corresponds (we provide additional technical notes before showing results).

All estimations were performed using Stata 15 software.

## Results

### Observed BLL data

The observed mean BLL among the participants in the blood sampling survey was 15.9 μg/dL, in which we did not make econometric adjustments (Table 1). The 50 percentile (median) BLL was 11.3 μg/dL, indicating a skewed distribution. Male BLLs tended to be higher than female ones. BLLs were generally negatively associated with age. Overall, approximately 5.3% of the participants reported BLLs exceeding 45 μg/dL. This proportion was 14.2% among children aged 0–5 years, but only nine adults (2.0%) reported such high BLLs. The observation size for those aged 11–18 years were small.

### Characteristics of blood sampling participants and non-participants

The characteristics of the participants and non-participants in blood sampling were not identical. Among children aged 0–10 years, the two groups significantly differed in terms of household location, size and living standards, with P values below 0.10 (Table 2). Among adults, the characteristics of the two groups were more clearly distinct, with P values mostly below 0.01 (Table 3). Therefore, the participants in blood sampling were not a random subset of our study target. Their BLLs (Table 1) can fail to represent the lead poisoning conditions of the population.

### Estimated BLL equation for children

In the BLL equation estimation based on OLS (Table 4, column I), the coefficients of the distance and direction variables had expected signs with P values below 0.01. BLL was decreasing in the distance, whereas the relationship between BLL and direction was U-shaped, with the highest peak at WNW, the lowest peaks at northeast and south ($$directio{n}_{i}\approx 5\pi /8$$), and a small peak at ESE. The explanatory powers of distance and direction were so large that R2 remained at 0.67 even after dropping other independent variables. Considering the strong powers of these factors and given that the values of these variables were similar among neighbouring households, we clustered standard errors for SEAs (in all the subsequent estimations as well). Altitude did not have a significant effect.

Age also had a significant effect. BLL peaked at 16.5 months, which is close to the average age of children to stop breastfeeding in Kabwe, 15.8 months33. This suggests a role of lead transfer through breastfeeding. BLL decreased by approximately 5% per year from the age of two years.

Among other factors, the dependency ratio raised BLLs, albeit with a marginally significant P value of 0.07. This suggests the possibility that parents in households with high dependency ratios failed to take sufficient precautionary measures for lead exposure. Mothers’ education reported a negative coefficient but its effect was insignificant with a P value of 0.10. Similarly, the per capita household expenditure did not have a significant coefficient.

Under Heckman’s sample selection model, the probability of participation significantly depended on age and household size (Table 4, column II). Although household income per capita reported significantly different means between the participants and non-participants (Table 2), its effect on participation was insignificant after other factors were controlled for. Conversely, while the mean age was almost identical between the two groups (Table 2), age had a significant non-linear effect on the probability of participation. The exclusion restriction, the duration of blood sampling window, had a significant effect with a P value below 0.01. However, the resulting BLL equation was similar to the OLS estimate (Table 4, column III). The inverse Mills ratio did not have a significant effect on BLLs with the P value of selection bias greater than 0.10. Therefore, selection bias was limited in terms of unobservable factors and the OLS estimate of the BLL equation was not significantly biased.

### Estimated BLL equation for adults

Under OLS, the effects of distance and direction were similar to those for children: BLL decreased with distance and had a U-shape relationship with direction, reaching the lowest levels in the northeast and south (Table 5, column I). Altitude had a negative coefficient, but was not significant with a P value above 0.10. Age and being female had significantly negative effects, although the marginal effect of age was moderate compared to that for children, approximately 0.5% per year. Own education also had a significantly negative effect on BLL, suggesting that knowledge or living conditions indicated by education levels affected adult BLLs. Duration of residence in Kabwe significantly increased BLLs.

The remaining columns show the results under Heckman’s sample selection model. The participation decisions of adults depended on various individual and household characteristics. Those with high levels of education and from large households were less likely to participate, whereas older adults, women, and those either married or co-habiting, having resided in Kabwe for a prolonged period, and from households with high dependency ratios were more likely to participate. The duration of the blood sampling window significantly increased the probability of participation. However, similar to the results for children, the inverse Mills ratio did not have a significant effect with P value above 0.10.

### Representative estimates of lead poisoning conditions

We estimated the BLLs of 4,898 individuals, all but two sample individuals who had missing information, that represent the lead poisoning conditions of the entire population (Table 6). Since the selection bias in terms of unobservable factors was not significantly observed (Tables 4, 5), we used the BLL equations obtained under OLS. All figures hereafter were weighted by the survey’s population weights.

The representative mean BLL was 11.9 μg/dL, with a 95% confidence interval of 11.6–12.1 μg/dL, which is 2.4 times higher than the standard reference level of 5 μg/dL. 74.9% of the residents had BLLs above 5 μg/dL. This proportion, as of 2017, corresponds to approximately 202,500 individuals based on our population estimate, 270,389 individuals33, and to approximately 170,400 individuals based on the relatively moderate population projection of 227,551 individuals by the Central Statistical Office of Zambia37. The 50 percentile (median) was 8.7 μg/dL. Men had significantly higher BLLs than women (the P value for zero difference is below 0.01). Notably, only 9.6% of children aged 0–5 years and 9.8% of children aged 6–10 years had BLLs below 5 μg/dL, although this study expanded the focus beyond the immediate neighbourhood of the mine waste dumping site. 4.6% of children aged 0–5 years had BLLs above 45 μg/dL, but our estimates did not predict such high BLLs for adolescents aged 11–18 years and adults.

Figure 1 depicts the in-depth relationship between the estimated BLLs and age. After peaking within the ages of 12–23 months, BLLs for children demonstrated a declining trend with age, albeit with fluctuations. Note that the BLLs of those aged 18 years and 19–29 years were continuously connected. This suggests that we successfully estimated the BLLs of those aged 11–18 years from the equation for children aged 0–10 years.

Figure 2 illustrates the simulated geographic distributions of BLLs, separately for children (a) and adults (b). To obtain the figure for children, we set age at 16 months, when BLL reaches the maximum. Thus, the figure for children can be considered the geographic distribution of the maximum BLL that a child with average traits is expected to report. Age is set at the local mean for adults, approximately 34–38 years.

For both children and adults, BLLs were high in WNW and ESE. BLLs greater than 45 μg/dL were found in the neighbourhood of the mine waste dumping site. BLLs tended to decrease with distance. However, the BLLs of children, at the maximum, exceeded 5 μg/dL throughout most areas.

## Discussion

This study estimated BLLs representative of the lead poisoning conditions among the entire population of Kabwe, Zambia, using the combined dataset of the ICP-MS measures of BLLs and a socioeconomic household survey. As in the previous studies on Kabwe and other health surveys in general, we were faced with the risk of selection bias in the BLL data in terms of both observable and unobservable factors, owing to non-random participation in the blood sampling survey. To overcome this problem, we employed econometric methods that controlled for differences in observable and unobservable factors between participants and non-participants in the blood sampling survey.

Our estimates showed that the mean BLL for the population was 11.9 μg/dL (Table 6), which is 25.2% lower than the mean of the observed BLLs of the participants (15.9 μg/dL, Table 1). While unobservable factors reported a minor bias (Tables 4, 5), the observable factors were not identical between the participants and non-participants (Tables 2, 3). In particular, the participants (or their parents) tended to have lower education levels and resided in Kabwe for a prolonged period, which were factors positively associated with BLLs. The age composition and household location were also different. These differences led to higher BLLs among the participants. Our estimate of the mean BLL was also lower than the ones in existing studies15,19,20,21, mainly because their focus was placed mostly on pollution hotspots, but their data could be faced by selection bias similar to our observed data. Further, both our estimated and observed mean BLLs were lower than our early results based on LeadCare II analyser22. Although LeadCare II analyser is considered fairly accurate, our samples included higher BLLs than ones to which LeadCare II is often applied, and this apparently led to overestimation of BLLs (see the Supplementary Material Section S2).

Nevertheless, our results illustrate the devastating lead poisoning problems in Kabwe. We confirmed critically high BLLs among children residing in the most contaminated areas. Further, the mean BLL of our estimates was considerably higher than the standard reference level of 5 μg/dL, and the proportion of those with BLLs above this level amounted to 74.9%. Based on our population estimate as of 201733, this proportion corresponds to 202,500 individuals (or 170,400 based on another population estimate37), which is greater than an existing estimate of 120,000 in the TSIP23.

These estimates provide a foundation for policy intervention designs. Since lead poisoning was widespread across the entire Kabwe district, interventions that span across the entire population are required. Thus, although immediate interventions, such as chelation therapy proposed under a World Bank project38, could focus on pollution hotspots, interventions to reduce lead transportation, such as capping the mine waste dumping site with concrete or clean soil, would be of fundamental importance. Our estimates also provide grounds for proper cost–benefit evaluations of interventions. For large-scale interventions, the benefits for the entire population, not only the residents in hotspots, need to be accounted for, and this requires population-level data. Proper cost–benefit evaluations are important for sustainability of interventions as they require large costs and long-term commitment (e.g. monitoring and maintenance).

Our methodology has an implication for other cases of health studies. Medical and clinical data collected through voluntary participation in testing can be subject to analogous selection bias problems to ours, particularly in cases in which formal and compulsory testing schemes are lacking. The extent of disruptions caused by selection bias can vary by case, and our econometric specifications would require modifications if applied to other cases. Nevertheless, the principle of our approach—collection of background data from the representative sample individuals, including those who did not participate in the medical testing, and correction of deviation in the characteristics of the participants—is applicable to various cases in which selection bias is a concern.

Finally we address the limitations of our study. First, our methodology was not employed to perfectly predict the BLL of each individual. Our estimates reflected variations of BLLs by gender, age groups, areas within Kabwe and various other factors but did not fully reflect idiosyncratic variations. Certain individuals with particular idiosyncratic factors can have high or low BLLs even if their traits and residential locations are associated with low or high BLLs. The second limitation, related to the first one, is the general difficulty to econometrically predict extreme outcomes. Such outcomes are scarce and idiosyncratic factors prevail over systematic ones. In our case, a small proportion of adults did report such BLLs, but our estimates did not predict such BLLs for adults. Finally, while we employed BLL as the measure, a comparison with alternative measures would improve the understandings of the lead poisoning problem in Kabwe. For example, bone and tooth conditions would reflect the effects of long-term lead exposure better, and clinical conditions would reflect idiosyncratic variations in the sensitivity to lead intake. Analysing these alternative measures could be the topic for further research.

## Data availability

The datasets used in the current study are not publicly available based on the ethical approvals from the University of Zambia Research Ethics Committee, the Ministry of Health of Zambia and the Kabwe District Medical Office.

## References

1. 1.

Canfield, R. L. et al. Intellectual impairment in children with blood lead concentrations below 10 μg per deciliter. N. Engl. J. Med. 348, 1517–1526 (2003).

2. 2.

Dapul, H. & Laraque, D. Lead poisoning in children. Adv. Pediatr. 61, 313–333 (2014).

3. 3.

Meyer, P. A., Brown, M. J. & Falk, H. Global approach to reducing lead exposure and poisoning. Mutat. Res. 659, 166–175 (2008).

4. 4.

World Health Organization (WHO). Childhood Lead Poisoning (WHO Press, Geneva, 2010).

5. 5.

Aizer, A., Currie, J., Simon, P. & Vivier, P. Do low levels of blood lead reduce children’s future test scores?. Am. Econ. J. Appl. Econ. 10, 1–36 (2018).

6. 6.

Aizer, A. & Currie, J. Lead and juvenile delinquency: new evidence from linked birth, school and juvenile detention records. Rev. Econ. Stat. 101, 575–587 (2019).

7. 7.

Miranda, M. L. et al. The relationship between early childhood blood lead levels and performance on end-of-grade tests. Environ. Health Persp. 115, 1242–1247 (2007).

8. 8.

Needleman, H. Lead poisoning. Annu. Rev. Med. 55, 209–222 (2004).

9. 9.

Attina, T. M. & Trasande, L. Economic costs of childhood lead exposure in low- and middle-income countries. Environ. Health Perspect. 121, 1097–1102 (2013).

10. 10.

Dooyema, C. A. et al. Outbreak of fatal childhood lead poisoning related to artisanal gold mining in northwestern Nigeria, 2010. Environ. Health Perspect. 120, 601–607 (2012).

11. 11.

Clark, A. R. L. Placental transfer of lead and its effects on the newborn. Postgrad. Med. J. 53, 674–678 (1977).

12. 12.

Kříbek, B. et al. Soil contamination near the Kabwe Pb-Zn smelter in Zambia: environmental impacts and remediation measures proposal. J. Geochem. Explor. 197, 159–173 (2019).

13. 13.

Nakayama, S. M. M. et al. Metal and metalloid contamination in roadside soil and wild rats around a Pb–Zn mine in Kabwe, Zambia. Environ. Pollut. 159, 175–181 (2011).

14. 14.

Tembo, B. D., Sichilongo, K. & Cernak, J. Distribution of copper, lead, cadmium and zinc concentrations in soils around Kabwe town in Zambia. Chemosphere 63, 497–501 (2006).

15. 15.

Yabe, J. et al. Lead poisoning in children from townships in the vicinity of a lead–zinc mine in Kabwe, Zambia. Chemosphere 119, 941–947 (2015).

16. 16.

Blacksmith Institute & Green Cross Switzerland. The World’s Worst 2013: Top Ten Toxic Threats. https://www.worstpolluted.org/docs/TopTenThreats2013.pdf (Pure Earth, 2013).

17. 17.

Human Rights Watch. We Have to Be Worried: The Impact of Lead Contamination on Children’s Rights in Kabwe, Zambia. https://www.hrw.org/report/2019/08/23/we-have-be-worried/impact-lead-contamination-childrens-rights-kabwe-zambia (2019).

18. 18.

Centers for Disease Control and Prevention (CDC). Low Level Lead Exposure Harms Children: A Renewed Call for Primary Prevention (CDC, Atlanta, 2012).

19. 19.

Bose-O’Reilly, S. et al. Lead intoxicated children in Kabwe, Zambia. Environ. Res. 165, 420–424 (2018).

20. 20.

Caravanos, J., Fuller, R. & Robinson, S. Notes from the field: severe environmental contamination and elevated blood lead levels among children—Zambia, 2014. Morb. Mortal. Wkly. Rep. 63, 1013 (2014).

21. 21.

Mbewe, A., Muneku, E., Michelo, C. & Siulapwa, Y. Blood lead levels and likely adverse health outcomes in under five children in Kabwe, Zambia. Int. J. Sci. Basic Appl. Res. 24(5), 171–183 (2015).

22. 22.

Yabe, J. et al. Current trends of blood lead levels, distribution patterns and exposure variations among household members in Kabwe, Zambia. Chemosphere 243, 125412 (2020).

23. 23.

Toxic Sites Identification Program (TSIP). https://www.pureearth.org/projects/toxic-sites-identification-program-tsip/ (Pure Earth).

24. 24.

Heckman, J. J. Sample selection bias as a specification error. Econometrica 47, 153–161 (1979).

25. 25.

Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983).

26. 26.

Duflo, E., Glennerster, R. & Kremer, M. Using randomization in development economics research: a toolkit. In Handbook of Development Economics, Vol. 4 (eds Schultz, T. P. & Strauss, J. A.) 3895–3962 (Elsevier, Amsterdam, 2007).

27. 27.

Gluud, L. L. Bias in clinical intervention research. Am. J. Epidemiol. 163, 495–501 (2006).

28. 28.

Tripepi, G., Jager, K. J., Dekker, F. W. & Zoccali, C. Selection bias and information bias in clinical research. Nephron Clin. Pract. 115, c94–c99 (2010).

29. 29.

The Lancet. Global Burden of Disease Study. https://www.thelancet.com/gbd (2020).

30. 30.

Landrigan, P. J. & Fuller, R. Environmental pollution: an enormous and invisible burden on health systems in low-and middle-income counties. World Hosp. Health Serv. 50(4), 35–40 (2012).

31. 31.

Landrigan, P. J. et al. The Lancet commission on pollution and health. The Lancet 391, 462–512 (2018).

32. 32.

Prüss-Ustün, A., Vickers, C., Haefliger, P. & Bertollini, R. Knowns and unknowns on burden of disease due to chemicals: a systematic review. Environ. Health 10(9), 1–15 (2011).

33. 33.

Hiwatari, M. et al. Kabwe Household Socioeconomic Survey (KHSS) 2017 Report (KAbwe Mine Pollution Amelioration Initiative, 2018). On-line version available at https://satreps-kampai.vetmed.hokudai.ac.jp/images/publications/2018/The%20Kabwe%20Household%20Socioeconomic%20Survey%20(KHSS)%20Report%20(WEB).pdf.

34. 34.

Nakata, H. et al. Reliability of stable Pb isotopes to identify Pb sources and verifying biological fractionation of Pb isotopes in goats and chickens. Environ. Pollut. 208, 395–403 (2016).

35. 35.

Centers for Disease Control and Prevention (CDC). Guidelines for the Identification and Management of Lead Exposure in Pregnant and Lactating Women (CDC, Atlanta, 2010).

36. 36.

Dietrich, K. N., Berger, O. G., Succop, P. A., Hammond, P. B. & Bornschein, R. L. The developmental consequences of low to moderate prenatal and postnatal lead exposure: intellectual attainment in the Cincinnati Lead Study cohort following school entry. Neurotoxocol. Teratol. 15, 37–44 (1993).

37. 37.

Central Statistical Office of Zambia. Population and Demographic Projections 2011–2035 (Central Statistical Office, Lusaka, Zambia, 2013).

38. 38.

World Bank. Zambia—Mining and Environmental Remediation and Improvement Project. https://projects.worldbank.org/en/projects-operations/project-detail/P154683 (2020).

## Acknowledgements

We are indebted to participants in our surveys in Kabwe. We are also grateful to the Central Statistical Office and Ministry of Health of Zambia, the Kabwe District Health Office, and laboratory technicians and nurses at health clinics in Kabwe. The study of this paper was conducted as a part of KAbwe Mine Pollution Amelioration Initiative (KAMPAI) project supported by Science and Technology Research Partnership for Sustainable Development (SATREPS; No. JPMJSA1501), a collaboration between Japan Science and Technology Agency (JST) and Japan International Cooperation Agency (JICA). This paper was also supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan awarded to D. Y., M. H., D. N. (No. 19KK0040), S. N. (Nos. 16K16197, 17KK0009) and M. I. (Nos. 16H0177906, 18K1984708).

## Author information

Authors

### Contributions

All authors made contribution in data collection. J.Y., S.N. and H.N. analysed blood samples and D.Y., M.H., P.H., D.N., C.M. and B.C. conducted data curation and analysis. The study was conceptualised by K.C. and M.I., supervised by M.H., C.M., J.Y. and S.N. and administrated by H.N. The manuscript was drafted by D.Y. and M.H., edited by P.H. and D.N., and critically revised and approved by all authors.

### Corresponding author

Correspondence to Daichi Yamada.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

### Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Yamada, D., Hiwatari, M., Hangoma, P. et al. Assessing the population-wide exposure to lead pollution in Kabwe, Zambia: an econometric estimation based on survey data. Sci Rep 10, 15092 (2020). https://doi.org/10.1038/s41598-020-71998-5

• Accepted:

• Published:

• ### The impact of elevated blood lead levels in children on maternal health-related quality of life

• Hokuto Nakata
• , Harukazu Tohyama
• , Wakako Fujita
• , Shouta M.M. Nakayama
• , Mayumi Ishizuka
• , John Yabe
• , Nosiku S. Munyinda
• , Doreen Sakala
• , Kennedy Choongo
• , Shojiro Yamasaki
• , Natsumi Nagai
• , Takahiko Yoshida
•  & Takeshi Saito

Chemosphere (2021)

• ### Assessment of LeadCare® II analysis for testing of a wide range of blood lead levels in comparison with ICP–MS analysis

• Hokuto Nakata
• , Shouta M.M. Nakayama
• , John Yabe
• , Kaampwe Muzandu
• , Haruya Toyomaki
• , Yared Beyene Yohannes
• , Andrew Kataba
• , Golden Zyambo
• , Yoshinori Ikenaka
• , Kennedy Choongo
•  & Mayumi Ishizuka

Chemosphere (2021)