Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identifying US populations for the study of health effects related to drinking water arsenic


The US Environmental Protection Agency recently set a new maximum contaminant level (MCL) for arsenic in drinking water of 10 μg/l. In this paper, we review the completeness and accuracy of drinking water arsenic occurrence data in the United States and identify populations exposed to elevated arsenic concentrations that would be suitable for epidemiological studies of arsenic health effects. Using existing data from the Environmental Protection Agency Arsenic Occurrence and Exposure Database and additional data from state health and environment departments and water utilities, we identified 33 counties in 11 states with an estimated mean drinking water arsenic concentration of 10 μg/l or greater. A total of 11 of these ‘confirmed’ counties had an estimated mean arsenic concentration of 20 μg/l or more and two had an estimated mean arsenic concentration 50 μg/l or more. Based on census data, between 1950 and 1999 there were approximately 51.1 million person-years of exposure to drinking water arsenic at levels of 10 μg/l or more, 8.2 million at levels of 20 μg/l or more arsenic and 0.9 million at levels of 50 μg/l or more. Mortality and incidence of diseases known to be associated with arsenic exposure can and should be examined in these counties as part of a comprehensive assessment of arsenic health effects in US populations.


The issue of elevated health risks from arsenic in United States drinking water supplies has received considerable attention in recent years. An analysis by the US Environmental Protection Agency (EPA) and two reports by the National Research Council (NRC) concluded that an arsenic maximum contaminant level (MCL) of 50 micrograms per liter (μg/l), in effect in the United States since 1974, is not sufficiently protective of public health (EPA, 1988; NRC, 1999; NRC, 2001). This led to the EPA proposing to lower the arsenic MCL to 10 μg/l and to the promulgation of the new standard in January 2001, with compliance by utilities required by January 2006.

Arsenic is an element that is present at low concentrations in nearly all aquatic and soil environments. Arsenic is the 20th most abundant element with a crustal abundance of 5 parts per million (ppm) and its concentration in seawater is about 1.5 μg/l (Emsley, 1990). Common natural sources of arsenic include sulfide minerals such as arsenophyrite (FeAsS), realgar (AsS) or orpiment (As2S3), or absorbed onto ferrihydroxide surfaces (Cullen and Reimer, 1989). The primary source of arsenic in drinking water is leaching of arsenic from arsenic-bearing rocks (Jekel, 1994). Higher ground water arsenic concentrations are frequently associated with sedimentary deposits derived from volcanic rocks (Jekel, 1994). These occur primarily in western mountain ranges of states in the basin and range province (New Mexico, Utah, Arizona and Nevada) (US EPA, 2000). High waterborne arsenic concentrations have also been attributed to sulfide mineral deposits in sedimentary rocks in Michigan and northeastern Wisconsin. In the northeastern United States (Massachusetts to Maine), higher waterborne arsenic concentrations are frequently related to sulfide minerals in the bedrock.

The Natural Resources Defense Council (NRDC) estimates that over 34 million Americans drink water that increases their risk of developing an arsenic-related cancer (NRDC, 2000). The EPA estimates that 5.3% of ground water systems and 0.8% of surface water systems contain arsenic at levels exceeding 10 μg/l (EPA, 2000). The methods used by the NRDC to obtain their estimates of the number of people exposed to elevated levels of drinking water arsenic are unclear. The EPA, however, based their estimate of the number of drinking water systems that will be affected by the lower MCL on an extrapolation of available groundwater arsenic occurrence data from the EPA's Arsenic Occurrence and Exposure Database (AOED). Unfortunately, when this database was developed, arsenic occurrence data were available from only 25 of 50 states and were only available for water sources rather than for drinking water systems. Thus, there is considerable uncertainty about the number of drinking water systems affected by the revised drinking water MCL for arsenic and even greater uncertainty about the number of people exposed to elevated levels of waterborne arsenic.

The purpose of this report is to identify arsenic-exposed populations suitable for epidemiological studies of arsenic health effects. To address this issue we examined drinking water arsenic occurrence data from several sources, compared the data between sources, and conducted a detailed review of systems identified as having arsenic-contaminated drinking water sources.


Sources of Arsenic Occurrence Data

To identify counties with elevated arsenic levels, we used three primary sources: the EPA Arsenic Occurrence and Exposure Database (AOED), data we obtained directly from states, and data we obtained directly from utilities and counties. We also examined, but did not use in our estimates of arsenic exposure, the NRDC database and data included in a publication by Engel and Smith (1994).

EPA Arsenic Occurrence and Exposure Database

To estimate the national occurrence of arsenic in drinking water, the EPA developed the Arsenic Occurrence and Exposure Database (AOED) (US EPA, 2000). The AOED is based on information from the EPA Safe Drinking Water Information System (SDWIS) and from state compliance monitoring data sets.

The list of water utilities in the AOED is derived from the SDWIS. The SDWIS is an inventory of water systems by state; it contains no data on levels of contaminants in those systems, however, it does list violations of maximum levels. The SDWIS contains name, address, federal identification number, source water type, ownership category, population served, and regulation classification (system type). Although few large water systems are missing from the SDWIS, some small, privately owned drinking water systems not known to state regulatory agencies are not included. Systems that use both ground and surface water are classified as a surface source, since surface water sources have more stringent treatment requirements concerning removal or inactivation of pathogens and protection from disinfection by-products. The population served by a given system is based on the number of retail customers and does not include the number of people served by other water systems that purchase water from the utility. The population served is usually an estimate since neither the state nor the utility knows how many people reside in the residences served by the utility.

Arsenic occurrence data in the AOED are derived from state compliance-monitoring data. Each state maintains some form of a drinking water compliance-monitoring database. In its most simple form this database contains information on compliance with all current MCLs for regulated pollutants, including whether or not the water utility is in compliance with the previous arsenic MCL of 50 μg/l. Larger utilities are required to test more frequently than smaller utilities. Reported compliance with the MCL implies that analytical procedures with a method detection level (MDL) of less than the MCL were utilized. In some states the compliance-monitoring database also contains the actual concentration of the pollutant detected and, in fewer states, it identifies the analytical method and detection levels.

Arsenic occurrence data from state compliance-monitoring databases were provided to EPA by 32 states. The remaining states were not able to comply. Data from seven of the 32 submitting states were not considered suitable for inclusion in the AOED because: (1) arsenic concentrations were truncated at a relatively high arsenic level, such as 20–50 μg/l, (2) arsenic concentrations were not included, (3) system identification numbers were not included, or (4) data were submitted on paper rather than computer-readable media. Data from the remaining 25 states were included. Some states submitted multiple data sets. In that event, only the most recent data set was included.

The AOED contains data from most regions of the United States; however, relatively few states in New England, the Mid-Atlantic, and the Southeast are included. The states and the percentage of the state's ground water systems that are included in the database are shown in Table 1.

Table 1 State compliance monitoring data included in the EPA AOED.

Additional State, County, and Utility Data

To validate the AOED data, we obtained new raw data from selected state drinking water agencies, counties, and utilities. In addition to the states included in the AOED, we obtained data from some states not included in the AOED or NRDC databases: Alaska, Michigan, Nevada, Idaho, Oklahoma, Colorado, California, and Texas. Data from counties included: Pierce County, Washington; Union County, North Carolina; Ramsey County, North Dakota; and Green, Piatt and Gallatin counties, Illinois.

NRDC Database

The NRDC also has a database on arsenic occurrence. The NRDC database started with arsenic concentrations for individual sources. These data were originally obtained from the EPA and may have been updated from information supplied directly by utilities. Perhaps because the NRDC database was assembled by volunteers, little documentation is available on how the database was assembled, what quality assurance tests were used in assembling or evaluating the data or whether the original data obtained from the EPA were subsequently updated. We used this database only for comparative purposes against other data sources, and did not use it in determining our estimates of county arsenic exposures.

Engel and Smith study

One published study provided arsenic exposure estimates for 30 US counties (Engel and Smith, 1994). This study found an association between arsenic exposure and the risk of cardiovascular death. According to the authors, the arsenic exposure data were obtained by requesting drinking water arsenic data for public water systems from all 50 states, the District of Columbia, and Puerto Rico. The mean arsenic level per county was computed by weighting the mean arsenic level per water system by the size of the population served by the system. However, in several counties with large populations we found poor agreement between this study's reported county arsenic levels with those that we calculated from data in the AOED and NRDC database. In an attempt to resolve these differences, we contacted local public health authorities. We were unable to find evidence of elevated waterborne arsenic concentrations in three counties classified by the study as having elevated waterborne arsenic concentrations (Pierce County, Washington; Sierra County, New Mexico; and Iberia County, Louisiana). Although we did not use the arsenic exposure data from this study to generate estimates of mean arsenic exposure levels, we examined the data for comparative purposes.

Identification of Counties for Potential Studies of Health Effects

We chose to ascertain arsenic exposure levels by county rather than city or town, since mortality and cancer incidence data suitable for health effects studies are most commonly reported at the county level. Furthermore, there may be less misclassification of the location of residence of a cancer case if data are reported for a county rather than a city. For example, individuals residing close to the boundary of a city may be incorrectly reported to reside within that city.

We determined the mean arsenic concentration of potential study counties in the following manner:

(1) Using data we obtained from the state or the EPA AOED when state data were not available, we estimated a mean arsenic concentration for each water system. An average arsenic concentration was estimated for each source by determining the average value for all arsenic tests reported for that source. Since we lacked individual well production data, we assumed that each source contributed an equal amount of drinking water to the utility. This initial estimate of the mean arsenic concentration for the utility was, therefore, the mean arsenic concentration of each of the water sources for that utility.

(2) For utilities with a single source or with multiple sources having similar arsenic concentrations (within 5 μg/l), the arsenic concentration for that utility was assumed to be the average arsenic concentration of the sources.

(3) By aggregating our data into counties, we identified the counties likely to have a mean arsenic concentration of 10 μg/l or more arsenic. A higher priority was given to improving the accuracy of the estimated mean arsenic concentration for water utilities in these counties. If available, we obtained the mean arsenic exposure level for a utility from their Consumer Confidence Report. In other cases, we obtained the information by directly contacting the utility. Most of our efforts were directed at obtaining drinking water arsenic concentrations for systems serving more than 10,000 people. A total of 122 utilities with elevated but uncertain drinking water arsenic concentrations were identified. This would occur if the utility had multiple sources with different arsenic concentrations. We obtained arsenic concentrations from 69 (57%) of them. In total, 54 of the 122 systems served more than 10,000 people and we were able to obtain information on 43 (80%) of these. As suggested by these data, the systems that could not be contacted or that could not provide data served a very small population and often had only one part time employee. When we could not obtain better information we assumed the mean arsenic concentration for the water system was the mean concentration of the different sources.

(4) Based on our best estimate of the water system mean water utility arsenic concentrations, we calculated the mean county arsenic concentration. Each water utility's mean arsenic concentration was weighted by the size of the population served. These weighted means were then summed for all water systems in the county to derive the estimated mean county arsenic concentration.

(5) As a final check, after identifying the counties with a mean arsenic concentration of 10 μg/l or greater, we prepared a spreadsheet of all arsenic occurrence data from state, EPA AOED, NRDC database, and water systems. This spreadsheet was checked to ensure that all data sources indicated elevated arsenic concentrations were observed for most water sources in the county and that the mean arsenic concentration was not greatly influenced by one or a small number of sources with very high arsenic concentrations. No outlying observations were identified in the counties with elevated arsenic concentrations. We also attempted to exclude all water systems that use surface-derived drinking water because surface water sources generally have much lower arsenic concentrations and because it is relatively easy to modify a surface water treatment plant to remove arsenic.

We excluded counties from further consideration if we could not accurately estimate the mean arsenic concentration for water systems that serve more than 10,000 people. We also excluded from further consideration any counties in which fewer than 75% of the public wells serving at least 10,000 people had elevated arsenic concentrations (i.e. 10 μg/l or more).

Estimation of Size of Exposed Populations

The size of the population served by the utilities in our database was compared to the 1990 county population estimates (US Bureau of the Census) to determine whether our capture of arsenic data reasonably covered the county population and could be used to estimate county exposure to elevated drinking water arsenic. We then estimated person-years of exposure available for study. Since computerized mortality data by county are available since 1950, we estimated the person-years of exposure for residents of the counties of interest for 1950–1999.


Water Systems and Counties with Elevated Arsenic Concentrations

We identified all water systems with at least one well with an arsenic concentration at or above 10 μg/l using the EPA AOED. The number of wells with elevated arsenic by state is shown in Table 2. In the 25 states contributing data to the AOED, there are 243 water systems with at least one well with arsenic concentrations above 50 μg/l. Of the wells at or exceeding 10 μg/l arsenic, 93% are in systems that serve fewer than 2500 people and 98% are in systems that serve fewer than 10,000 people.

Table 2 Number of wells with elevated arsenic.

Table 3 shows the mean arsenic concentrations by county for our “confirmed” counties, comparing data from various sources. Our estimated mean arsenic concentration for the county, derived according to the methods described above, are shown in the far right column of the table. Because of potential incompleteness of our data sources and the particular decision rules we applied in identifying counties for potential study, Table 3 may not include all US counties with elevated drinking water arsenic levels.

Table 3 Mean drinking water arsenic concentrations (μg/l) by data source for US counties with a mean arsenic concentration of 10 μg/l or greater.

Population Covered by Captured Arsenic Exposure Data

To ensure that the water systems used in our estimates accounted for most of that county's population, we compared the population served as estimated by the utilities in the county to the county's 1990 population as estimated by the US Census. These population size comparisons are presented in Table 4. The size of the population served by a water system was an estimate provided by the utility operator, and may not have been accurate in all instances. There appeared to be overcounting by some water systems, in that data indicate the number of people served was more than the number of people residing in the county in 1990. In general, however, this comparison of population sizes did not reveal any significant gaps in our data capture, with the exception of drinking water systems in Nevada for which the size of the population served was not available.

Table 4 Counties with arsenic concentrations 10 μg/l or greater.

Person-Years of Exposure

Table 5 summarizes the persons-years of exposure for each decade (1950–1959, 1960–1969, 1970–1979, 1980–1989, 1990–1999) for all “confirmed” counties with arsenic concentrations at or greater than 10, 20, and 50 μg/l. In the United States for the 50-year period 1950–1999, there are over 51 million person-years of exposure to waterborne arsenic at or exceeding 10 μg/l, 8.8 million person-years of exposure exposed to waterborne arsenic exceeding at or exceeding 20 μg/l arsenic, and 0.9 million person-years of exposure to waterborne arsenic at or exceeding 50 μg/l arsenic.

Table 5 Person-years of exposure by decade for counties with 10 μg/l or greater arsenic exposure.

Background US mortality rates for lung and bladder cancer combined range from 27 per 100,000 in females for 1950–1959 and increase to 49 per 100,000 in females by 1970–1979. Considering only arsenic-related cancers in females, assuming a baseline mortality rate of 40 per 100,000 for the period 1950–1999, a power of 80% and a significance level of 5%, we estimate that 450,000 person-years of exposure should be more than sufficient to detect an increased relative risk of 1.25 for these two cancers among people exposed to 50 μg/l or more drinking water arsenic. Sample sizes for specific years (1955–1975) for males and females with a range of relative risks of arsenic-associated cancer death are given in Table 6. This table should only be considered to be a general guide since the population size needed to detect an anticipated effect also depends on the level of in-migration and the net out-migration for age groups of interest. The level of migration increases for longer latency periods, reducing the power of the study to detect an arsenic-related health effect. Since people who leave an area may also return later, the net out-migration rate is the difference between the out-migration rates and the return migration rate.

Table 6 Sample size needed to detect a (Alpha=0.05, Power 0.90, ratio unexposed to exposed=4).


The purpose of this study was to identify populations that might form a base for an epidemiologic study of arsenic-related health risks and not to further characterize the exposed population. We believe that despite the shortcomings of the available data for estimating arsenic exposure levels, the data can accurately identify US counties where the average arsenic concentration is at or exceeding 10 μg/l. Our analysis and comparison of data sources identified 33 counties in 11 states in which the average arsenic concentration of at least 75% of public wells was at or exceeded 10 μg/l. Of these 33 “confirmed” counties, 12 had an average arsenic concentration that was at or exceeded 20 μg/l and two had an average arsenic concentration that was at or exceeded 50 μg/l arsenic.

These data show that nearly all water systems (98%) affected by the new 10 μg/l arsenic MCL serve fewer than 10,000 people. Since it appears that some small water systems have not performed arsenic testing or do not have arsenic concentrations listed in the AOED, it is likely that small systems will comprise an even higher fraction of all systems affected by the new regulation when more complete data become available.

Using the estimates of arsenic exposure derived from the EPA AOED and additional data we obtained directly from states, counties, and utilities, we estimate that between 1950 and 1999 there were 51.1 million person-years of exposure to arsenic at or exceeding 10 μg/l available for the study of arsenic health effects. There are 8.1 million person-years of exposure to arsenic at or exceeding 20 μg/l and 900,000 person-years of exposure to arsenic at or exceeding 50 μg/l in the United States.

Because few studies have examined the health effects from drinking water arsenic exposures in the United States and because almost one million person-years of observation are available for counties with 50 μg/l or more of drinking water arsenic, efforts should be made to compare the mortality rates for arsenic-related cancers in these counties to similar populations with lower levels of drinking water arsenic.


One of the factors complicating identification of arsenic-exposed populations is that large drinking water systems (10,000 or more customers) are more likely to rely on surface or ground water under the influence of surface water than on true ground water sources. This may occur because few aquifers with low rates of water recharge have sufficient water to support large population centers. Surface water or shallow groundwater under the influence of surface water appear to have lower arsenic concentrations. Furthermore, utilities that rely on surface waters must provide water treatment, and it is relatively easy to modify an existing treatment plant to remove arsenic (McNeill and Edwards, 1995).

Although the percentage of water systems with arsenic monitoring data varies by state, 82% of all systems had arsenic concentrations listed in the AOED database. Drinking water arsenic concentrations are based on recent analyses and may not reflect the levels in prior years. In several instances utilities have drilled new wells to find water with lower concentrations of arsenic. This was done to meet the 50 μg/l arsenic standard enforced in the early 1970's. In anticipation of a new lower standard, cities have recently begun drilling new wells to obtain water in anticipation of lower arsenic MCL. Therefore, the exposure estimates presented likely under-estimate historical exposure levels for these counties.

In addition to problems obtaining accurate drinking water arsenic concentrations from small utilities, there are a number of structural problems with the EPA AOED and state arsenic occurrence databases. For example, when one water system purchases drinking water from another system, only the seller and not the buyer must report drinking water arsenic levels to the state. In addition, it is not possible to estimate how many people drink the purchased water. Therefore, the size of the exposed populations presented here is probably an underestimate, especially for earlier decades.

For water sources with a measured arsenic concentration, the state databases seldom include a detection limit for the tests performed. Even when the method and detection limits are provided, they vary by analytical laboratory, state, and system within the state (EPA, 2000). The detection limits also change over time, sometimes declining and in other cases increasing (EPA, 2000). Detection limits were generally in the range of 2–20 μg/l arsenic (EPA, 2000). Although EPA developed a number of statistical approaches for analyzing these left-censored data (i.e., values below some detection limit), the lack of arsenic occurrence measurements of 3, 5, 10 or even 20 μg/l is an inherent limitation of the AOED.

A major limitation of the SDWIS, from which the AOED is derived, is that water production information for each source is not included. Therefore, it is not possible to identify which systems are served primarily by surface versus ground water sources or to determine whether a well with high arsenic levels is routinely used or is used only on an emergency basis. Since source production data were not available, inclusion of these wells with the assumption they contribute equally with other sources to the water system could result in errors in arsenic exposure estimates. This is one reason why we restricted our study to counties where most (75% or more) of the wells had elevated arsenic concentrations and why we contacted the larger utilities to obtain better estimates of system-wide arsenic concentrations.

Because of the lack of production data in the AOED and state databases, it is not possible to accurately estimate the mean arsenic concentration of water supplied to consumers for any system with two or more wells or sources with different arsenic concentrations. For example, a well with a low arsenic concentration may produce 80% of the drinking water and another well, with a high arsenic concentration, may produce only 20% of the water. The mean arsenic level for water delivered to customers should be a weighted average of the arsenic levels for the two wells, weighted by the relative contributions of the two sources. The EPA addressed this limitation by assuming that each source produced an equal amount of drinking water and the mean exposure was the arithmetic mean of the arsenic concentrations for each source. We made the same assumption for our preliminary identification of arsenic-exposed areas. We attempted to validate our estimates of drinking water arsenic concentrations for water systems by contacting utilities directly, with moderate success.

Actual well production data can be difficult to interpret, since production can vary seasonally and may also change over time. It might be possible, however, for states to include the design capacity of the well or surface sources as part of the state drinking water databases. This information could be used to estimate the water production from each source, thus greatly improving the value of the data.

Given the high costs of compliance with the new arsenic MCL, EPA should consider recommending that states develop better information systems for drinking water contaminants. At the very least, the majority of states should be able to provide contaminant level data to the EPA and provide the data on computer-readable media. Furthermore, the state databases should contain laboratory findings for contaminant levels from all water systems required to conduct such tests. High levels of missing data, or truncated laboratory data that only indicate whether the water system met the current MCL, is of limited value in estimating human exposure levels and in evaluating the public health implications of drinking water contaminants. Having comprehensive data on contaminant concentrations will assist in the re-evaluation of health risks from drinking water contaminants as new health effects data become available.


We have performed a careful review and cross-validation of arsenic occurrence data in existing data sources and verified and expanded those sources through the collection of data directly from states, counties, and water systems. As a result, we have identified 33 counties with elevated arsenic exposure and believe these counties are suitable for studies of arsenic-related health effects. The populations identified should be followed to determine if there is evidence of elevated mortality or incidence of arsenic-related health effects.


  1. Agency for Toxic Substances and Disease Registries (ATSDR). Toxicological profile for arsenic. CAS No. 7440-38-2, September, 2000, p. 237.

  2. Cullen W.R., and Reimer K.J. Arsenic speciation in the environment. Chem Rev 1989: 89: 713–764.

    CAS  Article  Google Scholar 

  3. Engel R.R., Smith A.H. Arsenic in drinking water and mortality from vascular disease: an ecological analysis in 30 counties in the United States. Arch Environ Health 1994: 49: 418–427.

    CAS  Article  Google Scholar 

  4. Frost F.J, Tollestrup K., Craun G.F. Center for Pharmcoeconomic and Outcomes Research, Lovelace Respiratory Research Institute, May 6, 2002, unpublished manuscript.

  5. Jekel M.R. Removal of arsenic in drinking water treatment. In: Nriagu J.O. (Ed.) Arsenic in the Environment Part 1: Cycling and Characterization. Advances in Environmental Science and Technology, Vol 26. John Wiley & Sons, New York, 1994.

    Google Scholar 

  6. McNeill L.S. and Edwards M.A. Soluble arsenic removal at water treatment plants. J Am Water Works Assoc 1995: 874: 105–113.

    Article  Google Scholar 

  7. National Research Council. Arsenic in Drinking Water. Subcommittee on Arsenic in Drinking Water, National Research Council, National Academy Press, Washington, DC, 1999.

  8. National Research Council. Update of the 1999 Arsenic in Drinking Water Report. Committee on Toxicology, Board on Environmental Studies and Toxicology, National Research Council. National Academy Press, Washington, DC, 2001.

  9. NRDC (National Resources Defense Council). Arsenic and old laws. A scientific and public health analysis of arsenic occurrence in drinking water, its health effects, and EPA's outdated arsenic tap water standard (by Paul Mushak, Ph.D.) February 2000 (URL:

  10. US EPA. Arsenic occurrence in public drinking water supplies. EPA-815-R-00-023, Office of Ground Water and Drinking Water, Environmental Protection Agency, Washington DC, December, 2000.

Download references


The authors are grateful to Judith Hurley for review and revision of the manuscript. This study was funded by the American Water Works Association Research Foundation.

Author information



Corresponding author

Correspondence to Floyd J Frost.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Frost, F., Muller, T., Petersen, H. et al. Identifying US populations for the study of health effects related to drinking water arsenic. J Expo Sci Environ Epidemiol 13, 231–239 (2003).

Download citation


  • arsenic
  • exposure assessment
  • drinking water
  • groundwater

Further reading


Quick links