Science is at the mercy of its language. It can be difficult for researchers to communicate what most excites them about the beauty, intricacy and complexity of the natural world. And when words fail, debates and arguments often arise.
One enduring debate has been resurrected by ENCODE, the Encyclopedia of DNA Elements — an ongoing multimillion-dollar project to catalogue the functional elements of the human genome. A headline-grabbing claim, first made in this publication last September, was that roughly 80% of human DNA had been ascribed some “biochemical function” thanks to the efforts of more than 440 scientists (The ENCODE Project Consortium Nature 489, 57–74; 2012).
That percentage is remarkably high, in part because of a broad definition of ‘function’. The ENCODE team used the term to include binding by a regulatory protein, or transcription into RNA — activities identified as widespread. But almost immediately, other scientists began to take this definition to task, calling it essentially meaningless.
Some background is useful. Genomes vary dramatically in size — sometimes irrespective of the complexity of the organism. Take, for example, the genome of the marbled lungfish (Protopterus aethiopicus), which clocks in at an excessive 133 billion base pairs. That of the pufferfish (Takifugu rubripes), by contrast, sports only 365 million.
For the ENCODE paper to suggest that humans have little genomic redundancy implies that the 3.2-billion-base-pair human genome hits a sweet spot in efficiency. Critics suggested, sometimes sharply, that this was both anthropocentric and ignorant of how evolution shapes the genome. Much of human DNA is non-functional, they insisted. It is a relic of history, garbled by mutation and essentially junk.
The most recent formal critique, published this week, suggests that similar analyses on organisms with very large and very small genomes would probably find the same density of functional elements (W. F. Doolittle Proc. Natl Acad. Sci. USA http://doi.org/kr3; 2013). This investigation has yet to be done.
The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.
“To dismiss the debate as semantics minimizes the importance of words and definitions.”
There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it” (see go.nature.com/8xorge).
The ferocity of the criticism has no doubt been fuelled by dissatisfaction over ENCODE’s top-down, big-science approach and the large share of research funds that it has attracted. Many biologists have called the 80% figure more a publicity stunt than a statement of scientific fact. Nevertheless, ENCODE leaders say, the data resources that they have provided have been immensely popular. So far, papers that use the data have outnumbered those that take aim at the definition of function.
The debate sounds like a matter of definitional differences. But to dismiss it as semantics minimizes the importance of words and definitions, and of how they are used to engage in research and to communicate findings. ENCODE continues to collect data and to characterize what the 3.2 billion base pairs might be doing in our genome and whether that activity is important. If a better word than ‘function’ is needed to describe those activities, so be it. Suggestions on a postcard please.
- Journal name:
- Date published: