Published online 18 May 2010 | Nature | doi:10.1038/news.2010.248

News

Existence of RNA 'dark matter' in doubt

The abundance of transcripts from the genome may have been overestimated.

Biomedical illustration of transcriptionMuch of the human genome may not be transcribed into RNA.C. & M. WERNER, VISUALS UNLIMITED /SCIENCE PHOTO LIBRARY

RNA 'dark matter' hinted at by previous studies of mammalian genomes may not exist after all. The mysterious matter refers to the large amounts of RNA that are copied from the DNA sequence, or transcribed, but which cannot be accounted for by the genes that have been identified so far.

Using next-generation sequencing technology, researchers based in Canada have found that, in human and mouse cells, most RNA transcripts are copies of regions within or near genes that are known to code for proteins or to regulate gene expression. The finding disputes earlier work claiming that the vast majority of the mammalian genome — including the 98% or so that does not code for proteins — is transcribed into RNA.

Over the past several years, reports have suggested that much more of the mammalian genome is transcribed into RNA than could be accounted for by known genes1,2,3. Some researchers believed this added to the evidence that non-coding DNA in our genome, sometimes known as 'junk' DNA, could be functional.

Most of these studies used tiling microarray technology, in which thousands of probes are used to assess the presence of RNA transcripts. But this method is known to suffer from limitations — some probes can become attached to inappropriate sequences, for example. More recent research, however, in which RNA transcripts are sequenced directly using a technique called RNA-Seq, have hinted that the amount of the mammalian genome transcribed might not be as great as the earlier work suggested4.

In the latest study, researchers led by Harm van Bakel of the University of Toronto in Ontario confronted the discrepancy directly by analysing samples from the same tissues, using both tiling microarrays and RNA-Seq. The team found that whereas the microarrays reported many mysterious transcripts, the RNA-Seq technology found few transcripts other than those linked to genes coding for proteins. The team's work is published in PLoS Biology5.

“This direct comparison between the microarray-based technologies and the RNA sequencing was the exact right thing to do.”


RNA-Seq is generally accepted to be more reliable than microarray technologies, says co-author Timothy Hughes, also at the University of Toronto, especially when analysing transcripts found at low concentrations. "We suspected for a long time that the tiling arrays were really noisy," Hughes says.

This is "a nice, balanced study that gives a lot of credibility" to the idea that dark matter transcription has been vastly overestimated, says sequencing expert Eric Schadt of Pacific Biosciences in Menlo Park, California, who was not involved in the study. "This direct comparison between the microarray-based technologies and the RNA sequencing was the exact right thing to do." The work should help scientists "think through the conflicting results that were being reported", adds Schadt, who has published work using microarrays showing massive transcription of the genome6.

Transcription puzzle

Although the majority of the transcripts identified were associated with genes that code for proteins, many of these transcripts had still not been reported before — which leaves open the question of what their function might be, van Bakel says. Some may represent different RNA transcripts from the same gene, whereas others may be involved with regulating DNA transcription.

Of the transcripts that weren't associated with known genes, most were either very short transcripts or found at very low levels — both signs of simple background 'noise' in transcription. So these transcripts may have no function at all, van Bakel says.

ADVERTISEMENT

But Philipp Kapranov of Helicos BioSciences, a biotechnology firm in Cambridge, Massachusetts, says that his lab, which also uses RNA-Seq, is seeing quite different results. He says that he and his colleagues continue to find a high percentage of a cell's RNA originating from regions between genes. "I don't know why the results presented in this paper are different from ours," he says, suggesting that there may be differences in their sequencing protocols.

Whether the transcripts are associated with genes or not, the true test of their functional relevance will probably come from studies in humans, Schadt says. Looking to see whether the presence of specific RNA transcripts correlates with gene expression or disease traits across a population "will let us know whether we should be paying attention to these types of sequences", he says. 

  • References

    1. Kapranov, P. et al. Science 296, 916-919 (2002). | Article | PubMed | ISI | ChemPort |
    2. Cheng, J. et al. Science 308, 1149-1154 (2005). | Article | PubMed | ISI | ChemPort |
    3. Bertone, P. et al. Science 306, 2242-2246 (2004). | Article | PubMed | ISI | ChemPort |
    4. van Bakel, H. & Hughes, T. R. Brief Funct. Genomic Proteomic 8, 424-436 (2009). | Article | PubMed | ChemPort |
    5. Van Bakel, H., Nislow, C., Blencowe, B. J. & Hughes, T. R. PLoS Biol. 8, e1371 (2010). | Article
    6. Schadt, E. E. et al. Genome Biol. 5, R73 (2004). | Article | PubMed
Commenting is now closed.