Tiny pieces of the genome can already explain many human characteristics. Erika Check Hayden looks at what they might reveal in the future.
In his 2000 State of the Union Address, President Bill Clinton chose to emphasize something he had recently heard from a genome researcher: that humans are all, irrespective of race, 99.9% the same genetically. "Modern science," he told his country's legislators, "has confirmed what ancient faiths have always taught: the most important fact of life is our common humanity." Seven years on, and four years after the final publication of the sequences from the Human Genome Project, new technologies and larger data sets are allowing genome biologists to answer a conundrum embodied in that unity-inspiring percentage: if our DNA is so similar, why do we seem different in so many ways?
The answer, in part, is that the genome is not as uniform as Clinton was led to believe; nor is it nearly as sedate, stable and homogeneous as scientists used to think. It's less a 'Book of Life', more a wiki; many of its ho-hum elements don't change, but some really interesting bits are constantly revised.
"Maybe 99% of our genome behaves in a nice, predictable way," says Gilean McVean, a statistical geneticist at the University of Oxford, UK. "But it's become clear that there is this pool of errant variants that are responsible for a lot of the dynamism in our genome, and we don't understand its consequences for disease risk or normal variation."
Over the past year, two large studies1, 2 have found evidence that many people carry around lots of large chunks of DNA that are deleted, copied, flipped or otherwise rearranged in other people. The findings confirm earlier studies that hinted at this type of 'structural' variation but were not large enough to command assent3. A study of the genome of sequencing pioneer Craig Venter also found much more variation from reference sequences than expected4.
The larger analyses estimate that such variable regions could make up more than 10% of the genome, vindicating scientists, such as Evan Eichler of the University of Washington in Seattle, who have long argued that structural variation is a major source of diversity. Scientists are still investigating how much it contributes to differences between populations. But it is already clearly linked to some differences between individuals that can be correlated with behaviour or environment. For example, a study published in September reported5 that evolution has driven a starch-digestion gene to duplicate itself in people with traditionally starch-heavy diets.
"We're getting away from this 0.1% figure that has been in our minds ever since the draft human genome sequence came out," says Hunt Willard, head of the Institute for Genome Sciences and Policy at Duke University Medical Center in Durham, North Carolina. "We're now looking at maybe half a per cent of content that is unique to individual genomes." The actual variation is thus lower than the extent of the variable regions, but larger than previously thought. "Maybe Eichler always had that number in his head," adds Willard, "but no one else did."
That said, Willard points out that population geneticists such as Luca Cavalli-Sforza and Richard Lewontin arrived at a similar figure in pioneering studies linking protein and gene diversity to the history of human populations. What's different today is the scale and type of data available.
The HapMap, a catalogue of the variation seen in 270 people from America, Japan, China and West Africa, is a case in point. Today, McVean and a cast of hundreds publish a second-generation analysis of the HapMap (see page 851), the first phase of which was published in 2005 (ref. 6). The new, more thorough version finds surprisingly high diversity in single nucleotide polymorphisms (SNPs) — parts of the genome marked out by specific changes in a single DNA base pair.
The HapMap uses SNPs to identify chunks of DNA that tend to stay the same within populations. Researchers can then create an ordered list of SNPs, and thus DNA chunks, for each chromosome and choose one 'tag SNP' to stand in for the many SNPs that travel together on each chunk. But in the updated HapMap, 1% of the more than 3 million SNPs that have now been analysed cannot be grouped with their neighbours to mark identical chunks of DNA. These 'untaggable SNPs' reveal parts of the genome that vary greatly between people. "These untaggable SNPs are completely doing their own thing," McVean says. "It's not a high percentage of SNPs, but it's still a lot of them."
Scientists are now obtaining DNA from seven more populations with African, Asian and European ancestry that could help explain the origin of the mystery SNPs. They are also discussing a massive new bout of sequencing in an international project involving Chinese, British and US funders that would use new technologies to sequence the genomes of 1,000 individuals. Along with the two individual genome sequences already released4, these data will fuel a field that is set to explode over the next year: the hunt for genetic signatures that discriminate between smaller and smaller groups.
"The HapMap data can clearly tell you whether you are African or Chinese, but the question becomes, how far can you take that?" asks population geneticist Carlos Bustamante from Cornell University in Ithaca, New York. "Can you predict whether somebody comes from one village or another? We are going to see all kinds of stuff we would never have imagined was possible."
But does this just amount to expensive, and possibly divisive, genealogy? Pardis Sabeti at the Broad Institute in Cambridge, Massachusetts, thinks not. Today, she publishes an extensive study that uses the HapMap to identify specific genes linked to human diversity (see page 913). Over the past three years, Sabeti and other scientists have performed a series of studies finding evidence for 'positive selection' in chunks of DNA that differ between populations — indicating that genes are evolving differently in people from different parts of the world.
Sabeti now reports that she has pinpointed specific genes that seem to be responsible for some of the positive selection affecting these chunks. For instance, she found that variants of two genes linked to infection with the Lassa virus are favoured in West Africans.
Sabeti hopes that such studies will help guide scientists towards biological pathways involved in such regionally specific diseases. But her work also raises the sensitive issue of the biological meaning and relevance of race. Variants peculiar to Asian populations in another pair of genes — linked to hair, teeth and sweat glands — have no obvious links to disease. And there is the possibility that such population-specific variations might lead in uncomfortable directions.
In 2005, for instance, geneticist Bruce Lahn from the University of Chicago in Illinois suggested that two genes linked to brain size had evolved rapidly in groups that migrated out of Africa tens of thousands of years ago7,8. His results prompted criticism among fellow scientists, who felt that he didn't have the proper evidence to back such an incendiary claim. Sabeti notes with relief that Lahn's genes haven't turned up in any genome-wide scan so far — another sign that his conclusions were unfounded. Lahn says that true tests of his work are beyond the scope of these approaches, and that he is using other methods, including resequencing parts of the genome, to bolster his conclusions.
"This is a very delicate time, and a dangerous time, as people start to come up with things that the general public, or the media, or various groups might misinterpret," Sabeti says. "I like the fact that, so far, the evidence we find for natural selection in humans is only skin deep."
Redon, R. et al. Nature 444, 444â€“454 (2006).
Stranger, B. E. et al. Science 315, 848â€“853 (2007).
Nature 437, 1084â€“1086 (2005).
Nature 447, 358â€“359 (2007).
Perry, G. H. et al. Nature Genet. 39, 1256â€“1260 (2007).
The International HapMap Consortium Nature 437, 1299â€“1320 (2005).
Evans, P. D. et al. Science 309, 1717â€“1720 (2005).
Mekel-Bobrov, N. et al. Science 309, 1720â€“1722 (2005).