Findings reveal how DNA is conserved across animals
The latest studies of the instructions embedded in the human genome are revealing how evolution has shaped our species.
On page 799 of this issue1,2, and in a themed issue of Genome Research3, scientists report the first findings from a project called ENCODE. This 'encyclopedia of DNA elements' attempts to discover how our cells make sense of the DNA sequence in the human genome. Already, ENCODE is up-ending one piece of conventional scientific wisdom: the idea that biologically relevant DNA resists change over evolutionary time.
ENCODE aims to catalogue all the “functional elements” in the genome — the DNA sequences that control how and when our cells use our genes. Most of these controls seem to be written into so-called non-coding DNA, which does not make a detectable protein product. Because organisms depend on functional elements working correctly, scientists have long thought that such elements should not change much over evolutionary time. So researchers have mostly looked for key functional elements in non-coding DNA that is the same across species, known as conserved or constrained DNA.
But ENCODE is the first project to compare long stretches of non-coding DNA across many mammals, from mice to monkeys to humans. This comparison suggests that evolutionary processes don't always freeze functional DNA in place.
“The fact that we found so much functional sequence that did not seem to be evolutionarily constrained across all mammals is really surprising,” says Elliott Margulies of the National Human Genome Research Institute in Bethesda, Maryland, who co-chaired one of the ENCODE analysis groups.
The finding comes from the ENCODE pilot project, which used multiple methods to collect and analyse data on just 1% of the human genome — not an easy task (see 'Scaling up to a monumental task'). In one part of the project, groups of experimental biologists used a suite of laboratory techniques to find out what portions of the genome might be functional. Meanwhile, groups of computational biologists compared the ENCODE sequences across humans and 28 other animals to find constrained regions of DNA that had changed little throughout evolution.
But when the different groups compared their results, they found that their predictions about key portions of the genome didn't always agree: the biologists' list of functional sequences didn't match the computational group's list of constrained sequences.
At first, many were sceptical of this result, says John Stamatoyannopoulos of the University of Washington in Seattle, a co-chair of one of the ENCODE analysis groups. “It raised some eyebrows,” he says. “But eventually all the ENCODE groups started coming out with the same thing.” Overall, biologists found no evidence of function for about 40% of the constrained ENCODE regions. On the flipside, about half of the functional elements found in non-coding DNA were totally unconstrained.
The finding that many constrained regions weren't considered to be functional is not too surprising, because it is unlikely that ENCODE included enough tests on enough different types of cells to capture every major aspect of biology. But the idea that important DNA might also be unstable is newer, and intriguing, because it undermines the assumption that biological function requires evolutionary constraint.
“We're generalizing this principle over mammals, and over many functional elements,” says Ewan Birney, head of genome annotation at the European Bioinformatics Institute in Cambridge, UK, and a leader of ENCODE. “We're coming out quite strongly that this is not merely a curiosity of our genome — it's a really important part of the way our genome works.”
But how can major components of the mammalian genome change essentially randomly over time? That is not entirely clear. The authors of the ENCODE paper speculate that the unconstrained genomic regions are evolving “neutrally” — that is, they are constantly changing in ways that are neither good nor bad for the individual. This means that, on the whole, many genetic changes simply don't affect overall biology.
This has major consequences for understanding the relationship between genetics and biology, Birney says. “It means, for example, that if you look at some conserved piece of biology — say, how the kidneys work in mice and humans — not all of those bits of biology will be conserved or constrained at the level of the DNA bases, and that's quite a strong shift.”
But not everyone agrees with that take. For example, John Mattick at the University of Queensland in Brisbane, Australia, argues that the widely accepted calculation of the baseline, or neutral, rate of mammalian evolution is flawed. Because measurements of constraint rely on a comparison with the neutral rate, it is possible that many of ENCODE's so-called unconstrained regions really aren't unconstrained, Mattick argues.
“I would have said that this finding suggests that many regions of our genome are evolving under weak selection pressure, or that our measurements of the neutral rate of evolution are incorrect,” says Mattick, who is an author on the ENCODE paper.
In fact, Mattick thinks scientists are vastly underestimating how much of the genome is functional. He and Birney have placed a bet on the question. Mattick thinks at least 20% of possible functional elements in our genome will eventually be proven useful. Birney thinks fewer are functional. The loser will buy the winner a case of the beverage of his choice.
Meanwhile, other scientists are gathering data to answer new questions raised by ENCODE. Many hope that other ongoing studies, such as comparable genome sequences from additional primate species, will help decide which parts of the ENCODE data to study first.
The ENCODE Project Consortium Nature 447, 799–816 (2007).
Greally, J. M. Nature 447, 782–783 (2007).
Genome Res. 17, Issue 6 (2007).
Nature 447, 361 (2007).
About this article
Cite this article
Check, E. Genome project turns up evolutionary surprises. Nature 447, 760–761 (2007). https://doi.org/10.1038/447760a
This article is cited by
BMC Bioinformatics (2008)