Analysis | Published:

Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome

Nature Biotechnologyvolume 23pages13831390 (2005) | Download Citation

Subjects

Abstract

In contrast to the fairly reliable and complete annotation of the protein coding genes in the human genome, comparable information is lacking for noncoding RNAs (ncRNAs). We present a comparative screen of vertebrate genomes for structural noncoding RNAs, which evaluates conserved genomic DNA sequences for signatures of structural conservation of base-pairing patterns and exceptional thermodynamic stability. We predict more than 30,000 structured RNA elements in the human genome, almost 1,000 of which are conserved across all vertebrates. Roughly a third are found in introns of known genes, a sixth are potential regulatory elements in untranslated regions of protein-coding mRNAs and about half are located far away from any known gene. Only a small fraction of these sequences has been described previously. A comparison with recent tiling array data shows that more than 40% of the predicted structured RNAs overlap with experimentally detected sites of transcription. The widespread conservation of secondary structure points to a large number of functional ncRNAs and cis-acting mRNA structures in the human genome.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1

    The Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  2. 2

    Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

  3. 3

    Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).

  4. 4

    Johnson, J.M., Edwards, S., Shoemaker, D. & Schadt, E.E. Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet. 21, 93–102 (2005).

  5. 5

    Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

  6. 6

    Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

  7. 7

    Imanishi, T. et al. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biology 2, 0856–0875 (2004).

  8. 8

    Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).

  9. 9

    Hüttenhofer, A., Schattner, P. & Polacek, N. Non-coding RNAs: hope or hype? Trends Genet. 21, 289–297 (2005).

  10. 10

    Hofacker, I.L. et al. Automatic detection of conserved RNA structure elements in complete RNA virus genomes. Nucleic Acids Res. 26, 3825–3836 (1998).

  11. 11

    Rivas, E., Klein, R.J., Jones, T.A. & Eddy, S.R. Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr. Biol. 11, 1369–1373 (2001).

  12. 12

    Washietl, S., Hofacker, I.L. & Stadler, P.F. Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA 102, 2454–2459 (2005).

  13. 13

    Moulton, V. Tracking down noncoding RNAs. Proc. Natl. Acad. Sci. USA 102, 2269–2270 (2005).

  14. 14

    Shabalina, S.A. & Kondrashov, A.S. Pattern of selective constraint in C. elegans and C. briggsae genomes. Genet. Res. 74, 23–30 (1999).

  15. 15

    Shabalina, S.A., Ogurtsov, A.Y., Kondrashov, V.A. & Kondrashov, A.S. Selective constraint in intergenic regions of human and mouse genomes. Trends Genet. 17, 373–376 (2001).

  16. 16

    Margulies, E.H., Blanchette, M., Haussler, D. & Green, E.D. Identification and characterization of multi-species conserved sequences. Genome Res. 13, 2507–2518 (2003).

  17. 17

    Dermitzakis, E.T. et al. Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs). Science 302, 1033–1035 (2003).

  18. 18

    Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  19. 19

    International Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  20. 20

    Cooper, G.M. et al. Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14, 539–548 (2004).

  21. 21

    Le, S.V., Chen, J.H., Currey, K.M. & Maizel, J.V., Jr. A program for predicting significant RNA secondary structures. Comput. Appl. Biosci. 4, 153–159 (1988).

  22. 22

    Washietl, S. & Hofacker, I.L. Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J. Mol. Biol. 342, 19–30 (2004).

  23. 23

    Hofacker, I.L., Fekete, M. & Stadler, P.F. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319, 1059–1066 (2002).

  24. 24

    Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).

  25. 25

    Accardo, M.C. et al. A computational search for box C/D snoRNA genes in the D. melanogaster genome. Bioinformatics 20, 3293–3301 (2004).

  26. 26

    Childs, J.L., Poole, A.W. & Turner, D.H. Inhibition of Escherichia coli RNase P by oligonucleotide directed misfolding of RNA. RNA 9, 1437–1445 (2003).

  27. 27

    Lin, J. et al. A universal telomerase RNA core structure includes structured motifs required for binding the telomerase reverse transcriptase protein. Proc. Natl. Acad. Sci. USA 101, 14713–14718 (2004).

  28. 28

    Avner, P. & Heard, E. X-chromosome inactivation: counting, choice, and initiation. Nat. Rev. Genet. 2, 59–67 (2001).

  29. 29

    Rougeulle, C. & Heard, E. Antisense RNA in imprinting: spreading silence through Air. Trends Genet. 18, 434–437 (2002).

  30. 30

    Pang, K.C. et al. RNAdb — comprehensive mammalian noncoding RNA database. Nucleic Acids Res. Database issue. 33, D125–D130 (2005).

  31. 31

    Hüttenhofer, A. et al. RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 20, 2943–2953 (2001).

  32. 32

    Bachellerie, J.-P., Cavaillé, J. & Hüttenhofer, A. The expanding snoRNA world. Biochimie 84, 775–790 (2002).

  33. 33

    Berezikov, E. et al. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120, 21–24 (2005).

  34. 34

    Bejerano, G. et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004).

  35. 35

    Mattick, J.S. RNA regulation: a new genetics? Nat. Rev. Genet. 5, 316–323 (2004).

  36. 36

    Glazov, E.A., Pheasant, M., McGraw, E.A., Bejerano, G. & Mattick, J.S. Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mrna splicing. Genome Res. 15, 800–808 (2005).

  37. 37

    Doudna, J.A. Structural genomics of RNA. Nat. Struct. Biol. 7, 954–956 (2000).

  38. 38

    Hartig, J.S., Grüne, I., Najafi-Shoushtari, S.H. & Famulok, M. Sequence-specific detection of microRNAs by signal-amplifying ribozymes. J. Am. Chem. Soc. 126, 722–723 (2004).

  39. 39

    Missal, K., Rose, D. & Stadler, P.F. Non-coding RNAs in Ciona intestinalis. Bioinformatics 21, Suppl 2, ii77–ii78 (2005).

  40. 40

    Griffiths-Jones, S. The microRNA Registry. Nucleic Acids Res. 32, D109–D111 (2004).

  41. 41

    Liu, C. et al. NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Res. Database issue. 33, D112–D115 (2005).

  42. 42

    Pesole, G. et al. UTRdb and UTRSite: specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Res. 30, 335–340 (2002).

  43. 43

    Scherer, S.W. et al. Human chromosome 7: DNA sequence and biology. Science 300, 767–772 (2003).

Download references

Acknowledgements

This work was supported in part by the Austrian Fonds zur Förderung der Wissenschaftlichen Forschung, Project No. P15893, by the German DFG Bioinformatics Initiative BIZ-6/1-2, and by the Austrian Gen-AU bioinformatics integration network sponsored by bm:bwk.

Author information

Affiliations

  1. Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, 1090, Austria

    • Stefan Washietl
    •  & Ivo L Hofacker
  2. Division of Genomics and RNomics, Innsbruck Medical University-Biocenter, Fritz-Pregl-Strasse 3, Innsbruck, 6020, Austria

    • Melanie Lukasser
    •  & Alexander Hüttenhofer
  3. Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Härtelstrasse 16-18, Leipzig, D-04107, Germany

    • Peter F Stadler
  4. Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, 87501, New Mexico, USA

    • Peter F Stadler

Authors

  1. Search for Stefan Washietl in:

  2. Search for Ivo L Hofacker in:

  3. Search for Melanie Lukasser in:

  4. Search for Alexander Hüttenhofer in:

  5. Search for Peter F Stadler in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Peter F Stadler.

Supplementary information

  1. Supplementary Fig. 1

    Northern Blot analysis of five H/ACA snoRNA candidates. (PDF 30 kb)

  2. Supplementary Table 1

    Detailed results of the native screen and the random control screen. (PDF 12 kb)

  3. Supplementary Table 2

    MicroRNAs missing from the input set. (PDF 11 kb)

  4. Supplementary Table 3

    H/ACA snoRNAs missing from the input set. (PDF 10 kb)

  5. Supplementary Table 4

    Selected ncRNAs from literature with conserved RNA secondary structures detected in our screen. (PDF 11 kb)

  6. Supplementary Table 5

    50 Selected RNAz Hits in intergenic regions overlapping with 'transfrag' transcriptional map. (PDF 347 kb)

About this article

Publication history

Published

Issue Date

DOI

https://doi.org/10.1038/nbt1144

Further reading