Much genetic variation between individuals may lie in regions of the genome containing areas of structural diversity. A new study reveals more than 1,700 such regions, nearly half of which had not been previously sequenced. Changes in these regions between individuals are likely to hold the keys to information about many diseases as well as to the evolutionary processes that shaped human history.

Previous studies had documented changes in structural features of the genome — inversions, deletions and duplications affecting from a few thousand to a few million base pairs. “We knew there was lots of structural variation out there,” says Evan Eichler, a human geneticist at the Howard Hughes Medical Institute, University of Washington in Seattle and the lead author of the study. “But we didn't have any sequence-based resolution or any systematic approach to really capture that variation.”

Yet, knowing the type of variation that existed in these regions and precisely where in the genome these regions are located was likely to be important. Regions of structural variation are thought to be unstable and rapidly evolving, some of them containing genes likely to have emerged relatively recently. Humans and chimpanzees are 98.9% identical in sequence, with some 35 million base-pair differences between them. Structurally variable regions of the human genome account for more than three times that amount of base-pair differences between humans and chimpanzees, says Eichler.

To find one type of these prone-to-change regions, Eichler and Jeffrey Kidd, a doctoral student in his lab and first author on the paper on page 56, devised a method to find what Eichler calls “one-armed bandits”. They created libraries of over a million overlapping pieces of DNA spanning the genomes of eight individuals from diverse geographic ancestry. Kidd then pulled out fragments that at one end matched a reference sequence of the human genome, to precisely map their locations, but at the other end, or “arm”, had no match in terms of length and/or orientation. He looked at these regions in more detail, sometimes at the sequence level, in the genomes of the eight individuals.

“Wherever he found these discordant fragments, he found missing parts of the human genome that were in some individuals and not in others,” says Eichler. By piecing together the sequence of nucleotides within these regions, the group produced the first high-resolution sequence map of structural variation.

The results show the need for complementary approaches to human genomic sequencing, says Eichler. Most sequencing technologies are designed to detect only small variations, such as single-nucleotide substitutions. In these systems, DNA from one individual is examined and the resulting sequence aligned to that of a reference genome — with no means to retrieve regions of variability that don't line up. “If we just sequence multiple humans without being comprehensive, we're not going to capture these complex regions thoroughly enough,” Eichler says.

Eichler's group is now trying to find associations between regions of structural variation and conditions such as autism, epilepsy and diabetes. In addition, the team is interested in comparing the function of genes found within these regions in humans and other primates to find evolutionary clues as to how humanness arose.