Integrating ethical and social issues

Data on human genetic variation are being generated and used to better understand human origins, susceptibility to illness and genetic causes of disease. The US National Human Genome Research Institute (NHGRI) recently proposed the next stage in this work to carry forward and expand these goals and to reaffirm a commitment, present since the start of the Human Genome Project, that appropriate uses of this information will be based on ethical, legal and social science analysis1.

The history of destructive episodes in genetic research makes this attention to the ethical and social implications of genomics research essential2. This is especially true of human genetic variation research, because it provides the opportunity to find the genetic basis of individual and group differences. The consideration of ethical, legal and social implications (ELSI) of genetic research will not be maximally effective if it separates the creation of knowledge from its uses or if it sees the solution to appropriate uses of science as coming from a “cohort of scholars in ethics, law, social science, clinical research, theology, and public policy”1 rather than emerging with and from the science. Thus, ELSI analysis should be integrated into science, with participation of scientists; should be conducted proactively, rather than after scientific research projects are conducted; and should anticipate and monitor applications of research. A collaborative effort that centrally involves scientists and dialog among many scientific communities is necessary to shape science for responsible uses, because the way in which science is designed and carried out fundamentally affects how it can be used. Too often, the mere availability of data and technology, rather than ethical considerations or social needs, drives its use in unintended ways; therefore, the awareness and involvement of scientists in thinking about downstream uses is needed at the earliest stages of research.

NHGRI is a leader in the concept of ELSI analysis and recently involved scholars from a diversity of backgrounds in planning large-scale projects such as the HapMap3. Scholars from anthropology, law, ethics and other disciplines have had input in the earliest stages of designing, carrying out and reporting genetic research intended to identify genes involved in diseases, protection against illness and responses to drugs4. This multidisciplinary approach was perceived to 'slow' the research while issues such as informed consent, community consultation and benefits were ironed out. But lack of attention to issues important to the communities that are affected by the research, and on whose behalf the research is purportedly done, can also slow or even halt research5,6 and breed deep distrust of scientists that can only hurt future efforts to carry out or raise funding for future research. Therefore, time spent making explicit the ever-present ethical and social issues and incorporating them into study design is better conceptualized as an integral part of the research process than as 'extra' time.

The HapMap has been an exemplar of integrated and proactive ELSI analysis in genetic variation research. Similar efforts have been organized for other genetic research projects, such as the development of a pharmacogenetics research network and database7 funded by the US National Institutes of Health. Far less attention has been paid to the application of genetic variation research for nonmedical purposes, however.

Nonmedical applications

One example of the need for more involvement of geneticists in ELSI considerations is in the application of human genetic variation research for forensic uses, particularly criminal identification. The same kinds of data that are used to analyze genetic differences between humans for medical purposes are also used in courts of law to determine identity. In a legal setting, the validity of certain analytic methods and the data they produce, especially those used to infer race from DNA sequences, are particularly troubling. Although the Federal Bureau of Investigation's DNA Advisory Board and other associated technical and scientific working groups have been active over the last decade, insufficient attention has been paid to the genetic, public policy or legal implications of these applications.

Over the last 10 years, the availability of DNA samples and of techniques for rapid DNA sequencing have created a vast body of human genetic variation research for forensic purposes. Standardized systems have been developed and rapidly adopted worldwide for determining whether DNA in a sample from a suspect matches that in a sample from a crime scene. The most commonly used systems in the US and the UK analyze fixed sets of short tandem repeat (STR) loci8,9. Setting laboratory error aside, lack of a match between the STRs in a crime-scene sample and those in the suspect's DNA sample eliminates that person as a suspect.

Conversely, a match between the two sets of STRs is typically presented as evidence that crime scene DNA came from the suspect. But this conclusion cannot be made with 100% certainty because the two samples are compared only at a limited number of loci. Hence, all conclusions of identity or nonidentity between two samples must be probabilistic. It is in trying to improve the precision of these probability calculations that forensics brings in concepts of race and ethnicity.

If samples from a crime scene and from a suspect are determined to match at select STR sites, the next step is to determine the probability that this match could have occurred by chance. This is called the match probability, and its calculation requires determining how commonly the alleles occur at the analyzed loci. If the alleles in the crime scene sample occur commonly, the chance is higher that the sample could have come from someone other than the suspect. But the crucial question is not only whether the alleles are commonly found, but among whom? That is, what is the relevant population for any given analysis on which to base an STR allele frequency? Ideally, this probability would be determined by analyzing the DNA of the entire population of people who could have conceivably left DNA at the crime scene and then calculating the frequency of the pattern of the DNA at the crime scene sample in this population. This is very impractical. Alternatively, forensic geneticists typically use reference databases categorized by race and ethnicity to calculate probabilities.

The decision to use these databases was stated in a report on forensic DNA typing produced by the US National Research Council (NRC)10,11,12. The NRC recommended that “[i]n general, the calculation of a profile frequency should be made with the product rule. If the race of the person who left the evidence-sample DNA is known, the database for the person's race should be used; if the race is not known, calculations for all the racial groups to which possible suspects belong should be made.”11

Despite the NRC's recommendation, some researchers continue to debate the use of the product rule to calculate the probability of a random match between crime scene sample and suspect13. Concern has focused on the assumption that the genetic loci that are analyzed occur independently in all populations12. But the capacity of these methods to accurately and consistently distinguish individuals is less our concern here than the assumptions about race that these methods reinforce. For example, the NRC's recommendation implies that (i) the 'races' of individuals whose DNA was analyzed to determine allele frequencies in populations or of suspects can consistently be assigned and (ii) if the racial labels applied to crime-scene samples and those applied to the populations with which they are compared are the same, then the sample and populations will be genetically similar.

Geneticists, most notably Eric Lander and Bruce Budowle, were active participants in the debate over how to calculate the probability of a random match14,15,16. The final article in their exchange claims that “the scientific issues have all been resolved”. But a series of arguments and counterarguments about the association between 'race' and patterns of DNA markers has been unfolding in the medical genetics literature over the last four years, and these arguments are relevant to, and should include, forensic geneticists. A lively and constructive dialog, including people from various disciplines such as ethics, history, and anthropology, has taken place within the genetics research community about whether genetic markers can be associated with, or used as a proxy for, race or ethnicity in various kinds of medical research17,18,19,20. This debate includes the extent to which population substructure exists21,22 and whether race and ethnicity are useful for controlling for population substructure in genotype-phenotype correlation studies23,24 or for identifying groups for tailored medical treatments25,26. These dialogs have also begun to address the clinical and social implications of the inherent error in applying probabilistic population data to individuals. These conversations are directly relevant to the forensic genetics community but have not been widely extended into this group.

The assumption that socially fluid labels, such as racial and ethnic categories, can be assigned to individuals and populations based on their genetics is problematic for conceptual and practical reasons when applied to forensics, just as it is problematic when applied in the medical context18. The calculation of a match probability for criminal identification purposes calls for assigning race or ethnicity to a sample (often by undescribed methods), if 'known', and assigning race or ethnicity to the reference populations (also by unstandardized and poorly described methods) from which allele frequencies are calculated. It then assumes that these assignments correspond. That is, if a sample is labeled as being from a 'black' individual, this person is considered genetically equivalent in some way to populations labeled 'black'.

We know, however, that the use of such labels varies widely over geographical time and space, so that such correspondence is not assured. For example, in the US, people with ancestry from India are sometimes labeled Asian and sometimes labeled white or 'Caucasian'27; they are not classified in the same way in the UK as in the US20. Self-identification of race or ethnicity does not solve the correspondence problem if the label on the individual sample does not correspond to the category assigned to data in DNA databases. Self-identification also does not solve the fluidity problem, because people usually self-identify to categories imposed externally (such as those used in the census), and those labels constantly change. Furthermore, we know that individual self-classification is not stable; for example, one US study found that one-third of people change their own self-identified race or ethnicity in two consecutive years28.

There is an urgent need to expand this debate into the field of forensics, for at least two reasons. Both signal an extension of DNA typing into new arenas of criminal identification. The first is a kind of 'function creep' whereby the functions of DNA profiling are gradually expanded29 on a basis that is scientifically controversial, if not questionable. The second is the US government's expansion of populations to whom these techniques can be applied.

Function creep

Analysis of STRs in human DNA was initially developed to determine the identity or nonidentity of a sample of unknown origin with a sample of known origin. In this way, crime-scene samples could be compared with those collected from suspects or victims of a crime, or unidentified battlefield remains could be compared with DNA samples collected from enlisted soldiers to identify them.

The same kinds of analysis, however, have now been used to create suspects where there are none, with the new, stated assumption that patterns of STRs are associated with visually identifiable physical characteristics. The weak predictive power of the STR loci is demonstrated in an article reporting “a method for inferring the ethnic origin of a DNA sample profiled using the SGM [second generation multiplex]” in five British populations (classified in the paper as Caucasian, Afro-Caribbean, Indian sub-continental, Southeast Asian and Middle Eastern)30. In an attempt “to discriminate between the ethnic groups in the suspect population...a set of 10 000 profiles was simulated from each of the five ethnic groups considered here, using allele distributions estimated from the data. For every profile in a set, its probability within each ethnic group was estimated.”30 (Table 1).

Table 1 Misclassification table comparing true versus predicted ethnic group

Classifications into the five 'ethnic' groups were assigned by police officers by visual characteristics: “The profiles included in the databases were therefore generated from criminal justice (CJ) samples taken when individuals were arrested for an offence. Designation of ethnic group was by police officers and was based on appearance rather than any knowledge of an individual's ancestry.”30

This example brings up a number of areas of debate about the relationship between race, ethnicity and genes that have been raised in the biomedical literature and should also be addressed in the forensics literature.

First, the fluidity of racial and ethnic categories should be acknowledged. As has been discussed at length in the medical literature, racial and ethnic classification by appearance is often inconsistent20,31. Thus, even if individuals could consistently be clustered into groups by genetic profiles, the correlations between visual and genetic classifications (Table 1) would be low.

Nevertheless, this type of DNA analysis has been used to create suspect pools based on race, as in the case of a serial killer in Louisiana. Police were looking for a 'white' male, but DNA from the crime scene suggested that the perpetrator was of “African and American Indian ancestry”32, implying that such a person could not be 'white'.

In addition to the difficulties posed by the social fluidity of race and ethnicity that make them such problematic variables in genetic research, several other issues have been raised and debated in the medical literature. These include (i) inadequate description of methods for assigning race and ethnicity to populations33,34,35,36; (ii) the problem of sampling small or isolated populations and generalizing to larger groups such as 'Africans'26; and (iii) the validity of using small numbers of genetic loci to group individuals by ancestry when hundreds of markers might be necessary21,35. Non-STR genetic markers may have a better correlation with phenotype37 but to the extent that these correlations are made by race, many of the concerns discussed above still apply.

Broadening DNA collection

At the same time that the analytic uses of DNA collected for forensic purposes are gradually expanding (under assumptions that are being increasingly challenged in the medical research community), the databases of DNA data on criminals, with which suspect or victim DNA samples can be compared, are also expanding, to include people other than those who were originally intended to be included (sex offenders)38.

DNA evidence is now admissible in courts of virtually all jurisdictions in the US and in other countries39. In the US, STR profiles of DNA collected from crime scenes are compared with DNA profiles collected locally (including, but not limited to, those of suspects for the crime in question). If there is no match with the local database, the profile can be compared with state and then national (National DNA Index System) collections. Together, the local, state and national databases are known as the Federal Bureau of Investigation Combined DNA Index System (CODIS) database, authorized by the DNA Identification Act of 1994 (ref. 40) for law enforcement purposes. According to the CODIS website (, “CODIS enables federal, state, and local crime labs to exchange and compare DNA profiles electronically, thereby linking crimes to each other and to convicted offenders”. CODIS obtains DNA profiles from individual states (now 49 states, the US Army Crime Lab, the Federal Bureau of Investigation and Puerto Rico). States determine which DNA profiles are acceptable for inclusion, now mostly convicted sex offenders but increasingly many other categories of felons. Some states have expanded their databases to include all felons or even all arrested persons. As of April 2004, the National DNA Index System contained 1,762,005 DNA profiles, including 80,302 crime-scene samples and 1,681,703 from convicted offenders.

Recent legislative efforts suggest that this number will probably increase rapidly. Bills have been introduced in the US House of Representatives and the Senate to give states the authority to expand the CODIS database so that it could potentially include DNA profiles from “arrestees and persons who have been charged but not yet convicted, juvenile offenders, and persons convicted of misdemeanors”41. The current California ballot includes Proposition 69, which would require “collection of DNA samples from all felons, and from adults and juveniles arrested for or charged with specified crimes, and submission to state DNA database; and, in five years, from adults arrested for or charged with any felony”42. Given that the arrest pattern is already biased towards racial and ethnic minorities41, the increased inclusion of individuals in these groups in DNA databases, even if they are not convicted of a crime, raises the potential for future 'identification' of members of these groups as seemingly established as perpetrators of a crime by what are actually probabilistic and scientifically evolving standards.

Information based on genetic variation data has serious implications for individuals and groups outside the clinical arena. Police have used it to justify 'DNA dragnets' that collected tissue samples from hundreds of 'volunteers', as in the Louisiana serial killer case. The company that carried out the DNA analysis in this case billed it as a success because it “helped lead investigators to a suspect” (DNAPrint website,, Derrick Todd Lee. Others claim, however, that “the Louisiana dragnet didn't catch Lee. That was done by alert detectives who picked up a lead from an unrelated case. But it did give the task force investigating the murders the DNA of hundreds of innocent men.”

DNA dragnets are damaging to civil liberties, especially because DNA samples have been taken without probable cause from people who are not suspects, not truly voluntarily43,44,45, and without provisions in the law for destroying or returning them43,45. Although technically the samples are collected from volunteers, in practice the standards for sample collection are quite different from those used for medical research. The typical research consent process, which in many countries requires oversight and approval by institutional bodies, has explicit written provisions for withdrawing from the study and disposition of data and tissue samples. In contrast, the collection of DNA samples for criminal investigation purposes has no provisions for destroying or returning samples of those found innocent. Individuals have had to sue in order to retrieve their tissue44,45,46. Furthermore, unlike medical research, the consequences of declining to 'volunteer' a DNA sample in a dragnet are social stigmatization, coercion or forcible collection of tissue samples by other means45. In at least one criminal investigation, those who did not 'volunteer' to give DNA samples were reportedly issued search warrants in order to obtain DNA44. Those who decline to 'volunteer' face social ostracism because of the belief that “if you don't want to give your DNA, you've got something to hide”44. Attorney Barry Scheck, director of the Innocence Project, which advocates using postconviction DNA testing to exonerate the wrongfully convicted, said, “It's inherently coercive when a policeman comes to your door and says, 'Give us a sample of your blood and if you don't give it to us, you're a suspect.'”44

Reasonable arguments could be made that the standards of consent for taking DNA samples for medical research do not apply in their entirety to the taking of samples from convicted felons47. But these arguments do not similarly hold for taking DNA samples from individuals who are not convicted of crimes. Furthermore, it is clear that the term 'voluntary', in practice, means very different things in the worlds of medical research and criminal investigations. This discrepancy could be damaging to legitimate uses of DNA samples in both worlds.


The number of DNA profiles stored in CODIS and the genetic tools to analyze them will probably continue to grow, as will their combined impact on the criminal justice system. Although the expanded collection of DNA in itself should be a topic for public debate, the issue here is the uses to which these samples are put, envisioned and enabled by medical and nonmedical genetic research, and the role of scientists in shaping these uses. Attributing racial and ethnic labels to samples, a subject of considerable and still unresolved debate in medical genetics, seems well on its way to acceptance in forensics and the courtroom. Research that aims to extend use of these labels to support phenotypic or visual identification is still rare but interest in it is strong. Misuse of genetic research for nonmedical applications in the volatile arena of race will severely erode the public's trust in the application of genetics to health. This troubling prospect underscores the need for medical and ELSI researchers to look at applications of genetic research beyond the lab and clinic and to widen the dialog to a broader range of scientific and policy communities.