RNA-binding proteins are key regulators of gene expression, yet only a small fraction have been functionally characterized. Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes. The sequence specificities of RNA-binding proteins display deep evolutionary conservation, and the recognition preferences for a large fraction of metazoan RNA-binding proteins can thus be inferred from their RNA-binding domain sequence. The motifs that we identify in vitro correlate well with in vivo RNA-binding data. Moreover, we can associate them with distinct functional roles in diverse types of post-transcriptional regulation, enabling new insights into the functions of RNA-binding proteins both in normal physiology and in human disease. These data provide an unprecedented overview of RNA-binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukaryotes.

  • Subscribe to Nature for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.


Gene Expression Omnibus

Data deposits

Raw and processed microarray data are available at GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE41235. The derived motifs and results of analyses are available at http://hugheslab.ccbr.utoronto.ca/supplementary-data/RNAcompete_eukarya/.


  1. 1.

    , , & RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582, 1977–1986 (2008)

  2. 2.

    RNA regulons: coordination of post-transcriptional events. Nature Rev. Genet. 8, 533–543 (2007)

  3. 3.

    , , , & RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 39, D301–D308 (2011)

  4. 4.

    , & SnapShot: The splicing regulatory machinery. Cell 133, 192.e1 (2008)

  5. 5.

    , & Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res. 34, 4943–4959 (2006)

  6. 6.

    , , , & Gene expression regulation in trypanosomatids. Essays Biochem. 51, 31–46 (2011)

  7. 7.

    et al. Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites. Cell 133, 1277–1289 (2008)

  8. 8.

    et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133, 1266–1276 (2008)

  9. 9.

    et al. Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics 28, i84–i89 (2012)

  10. 10.

    & Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors. Bioinformatics 24, 1850–1857 (2008)

  11. 11.

    et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nature Biotechnol. 27, 667–670 (2009)

  12. 12.

    & Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protocols 4, 393–411 (2009)

  13. 13.

    , , & Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. RNA 16, 1096–1107 (2010)

  14. 14.

    et al. RNA targets of wild-type and mutant FET family proteins. Nature Struct. Mol. Biol. 18, 1428–1431 (2011)

  15. 15.

    , , & RNA-binding proteins Rbm38 and Rbm24 regulate myogenic differentiation via p21-dependent and -independent regulatory pathways. Genes Cells 14, 1241–1252 (2009)

  16. 16.

    et al. The RNA-binding protein SUP-12 controls muscle-specific splicing of the ADF/cofilin pre-mRNA in C. elegans. J. Cell Biol. 167, 639–647 (2004)

  17. 17.

    , & RNA sequence- and shape-dependent recognition by proteins in the ribonucleoprotein particle. EMBO Rep. 6, 33–38 (2005)

  18. 18.

    et al. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res. 21, 193–202 (2011)

  19. 19.

    et al. Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep. 1, 167–178 (2012)

  20. 20.

    & RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. EMBO J. 13, 1197–1204 (1994)

  21. 21.

    et al. Genome-wide analysis of alternative pre-mRNA splicing and RNA-binding specificities of the Drosophila hnRNP A/B family members. Mol. Cell 33, 438–449 (2009)

  22. 22.

    et al. Systematic discovery of structural elements governing stability of mammalian messenger RNAs. Nature 485, 264–268 (2012)

  23. 23.

    , , , & MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol. 5, R98 (2004)

  24. 24.

    et al. An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nature Struct. Mol. Biol. 16, 130–137 (2009)

  25. 25.

    , & Ribonomic analysis of human Pum1 reveals cis-trans conservation across species despite evolution of diverse mRNA target sets. Mol. Cell. Biol. 28, 4093–4103 (2008)

  26. 26.

    et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008)

  27. 27.

    et al. Transcriptome-wide regulation of pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150, 710–724 (2012)

  28. 28.

    , , & Polypyrimidine-tract-binding protein: a multifunctional RNA-binding protein. Biochem. Soc. Trans. 36, 641–647 (2008)

  29. 29.

    , & The Quaking family of RNA-binding proteins: coordinators of the cell cycle and differentiation. Cell Cycle 9, 1929–1933 (2010)

  30. 30.

    Hu antigen R (HuR) functions as an alternative pre-mRNA splicing regulator of Fas apoptosis-promoting receptor on exon definition. J. Biol. Chem. 283, 19077–19084 (2008)

  31. 31.

    & RBM4: a multifunctional RNA-binding protein. Int. J. Biochem. Cell Biol. 41, 740–743 (2009)

  32. 32.

    , & Identification of HuR as a protein implicated in AUUUA-mediated mRNA decay. EMBO J. 16, 2130–2139 (1997)

  33. 33.

    et al. Human Pumilio proteins recruit multiple deadenylases to efficiently repress messenger RNAs. J. Biol. Chem. 287, 36370–36383 (2012)

  34. 34.

    et al. Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol. Cell 36, 996–1006 (2009)

  35. 35.

    et al. Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev. 22, 2550–2563 (2008)

  36. 36.

    et al. RBFOX1 regulates both splicing and transcriptional networks in human neuronal development. Hum. Mol. Genet. 21, 4171–4186 (2012)

  37. 37.

    et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380–384 (2011)

  38. 38.

    et al. Deciphering the splicing code. Nature 465, 53–59 (2010)

  39. 39.

    , , , & Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol. 6, e255 (2008)

  40. 40.

    , , & Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007)

  41. 41.

    et al. SMAUG is a major regulator of maternal mRNA destabilization in Drosophila and its translation is activated by the PAN GU kinase. Dev. Cell 12, 143–155 (2007)

  42. 42.

    et al. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131, 174–187 (2007)

  43. 43.

    & Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009)

  44. 44.

    et al. Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 149, 1393–1406 (2012)

  45. 45.

    et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)

  46. 46.

    et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

  47. 47.

    & STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 35, W253–W258 (2007)

Download references


We thank H. van Bakel for computational support, A. Ramani and J. Calarco for discussions, Y. Wu, G. Rasanathan, M. Krishnamoorthy, O. Boright, A. Janska, J. Li, S. Talukder, A. Cote and S. Votruba for technical assistance, L. Sutherland for purchasing RBM5 protein and for feedback on the manuscript, S. Jain for software modified to create Fig. 2, and N. Barbosa-Morais for generating cRPKM values from autism RNA-seq data. We thank M. Kiledjian (PCBP1 and PCBP2), J. Stevenin (SRSF2 and SFRS7), S. Richard (QKI), M. Gorospe (TIA1), B. Chabot (SRSF9), A. Berglund (MBNL1), F. Pagani (DAZAP1), A. Bindereif (HNRNPL), M. Freeman (HNRNPK), E. Miska (LIN28A), K. Kohno (YBX1), M. Garcia-Blanco (PTBP1), R. Wharton (PUM-HD), C. Smibert (Vts1p) and M. Blanchette (Hrb27C, Hrb87F and Hrb98DE) for sending published constructs. This work was supported by funding from NIH (1R01HG00570 to T.R.H. and Q.D.M., R01GM084034 to K.W.L.), CIHR (MOP-49451 to T.R.H., MOP-93671 to Q.D.M., MOP-125894 to Q.D.M. and T.R.H., MOP-67011 to B.J.B., and MOP-14409 to H.D.L.), and the Intramural Program of the NIDDK (DK015602-05 to E.P.L.). K.B.C. and S.G. hold NSERC Alexander Graham Bell Canada Graduate Scholarships. M.T.W. was funded by fellowships from CIHR and CIFAR. H.S.N. holds a Charles H. Best Fellowship and was funded partially by awards from CIFAR to T.R.H. and B.J.F. M.I. is the recipient of an HFSP LT Fellowship.

Author information

Author notes

    • Debashish Ray
    • , Hilal Kazan
    • , Kate B. Cook
    • , Matthew T. Weirauch
    •  & Hamed S. Najafabadi

    These authors contributed equally to this work.

    • Matthew T. Weirauch

    Present address: Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Rheumatology and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229, USA.


  1. Donnelly Centre, University of Toronto, Toronto M5S 3E1, Canada

    • Debashish Ray
    • , Matthew T. Weirauch
    • , Hamed S. Najafabadi
    • , Mihai Albu
    • , Hong Zheng
    • , Ally Yang
    • , Hong Na
    • , Manuel Irimia
    • , Andrew G. Fraser
    • , Benjamin J. Blencowe
    • , Quaid D. Morris
    •  & Timothy R. Hughes
  2. Department of Computer Science, University of Toronto, Toronto M5S 2E4, Canada

    • Hilal Kazan
    •  & Quaid D. Morris
  3. Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada

    • Kate B. Cook
    • , Xiao Li
    • , Serge Gueroussov
    • , Howard D. Lipshitz
    • , Andrew G. Fraser
    • , Benjamin J. Blencowe
    • , Quaid D. Morris
    •  & Timothy R. Hughes
  4. Department of Electrical and Computer Engineering, University of Toronto, Toronto M5S 3G4, Canada

    • Hamed S. Najafabadi
    • , Brendan J. Frey
    •  & Quaid D. Morris
  5. Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA

    • Leah H. Matzat
    • , Ryan K. Dale
    •  & Elissa P. Lei
  6. Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA

    • Sarah A. Smith
    • , Christopher A. Yarosh
    • , Behnam Nabet
    • , Russ P. Carstens
    •  & Kristen W. Lynch
  7. Department of Biochemistry, Emory University School of Medicine, Atlanta, Georgia 30322, USA

    • Seth M. Kelly
    •  & Anita H. Corbett
  8. Department of Biology and Center for Genomics and Systems Biology, New York University, New York, New York 10003, USA

    • Desirea Mecenas
    •  & Fabio Piano
  9. Molecular and Cellular Pharmacology Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA

    • Weimin Li
    • , Rakesh S. Laishram
    •  & Richard A. Anderson
  10. Children’s Cancer Research Institute, UTHSCSA, San Antonio, Texas 78229, USA

    • Mei Qiao
    •  & Luiz O. F. Penalva


  1. Search for Debashish Ray in:

  2. Search for Hilal Kazan in:

  3. Search for Kate B. Cook in:

  4. Search for Matthew T. Weirauch in:

  5. Search for Hamed S. Najafabadi in:

  6. Search for Xiao Li in:

  7. Search for Serge Gueroussov in:

  8. Search for Mihai Albu in:

  9. Search for Hong Zheng in:

  10. Search for Ally Yang in:

  11. Search for Hong Na in:

  12. Search for Manuel Irimia in:

  13. Search for Leah H. Matzat in:

  14. Search for Ryan K. Dale in:

  15. Search for Sarah A. Smith in:

  16. Search for Christopher A. Yarosh in:

  17. Search for Seth M. Kelly in:

  18. Search for Behnam Nabet in:

  19. Search for Desirea Mecenas in:

  20. Search for Weimin Li in:

  21. Search for Rakesh S. Laishram in:

  22. Search for Mei Qiao in:

  23. Search for Howard D. Lipshitz in:

  24. Search for Fabio Piano in:

  25. Search for Anita H. Corbett in:

  26. Search for Russ P. Carstens in:

  27. Search for Brendan J. Frey in:

  28. Search for Richard A. Anderson in:

  29. Search for Kristen W. Lynch in:

  30. Search for Luiz O. F. Penalva in:

  31. Search for Elissa P. Lei in:

  32. Search for Andrew G. Fraser in:

  33. Search for Benjamin J. Blencowe in:

  34. Search for Quaid D. Morris in:

  35. Search for Timothy R. Hughes in:


D.R., H.K., K.B.C., M.T.W. and H.S.N. made unique, essential and extensive contributions to the manuscript, and are ordered by amount of time and effort contributed. D.R. and H.K. developed most of the laboratory and computational components of RNAcompete, respectively. D.R., H.Z., A.Y., H.N., L.H.M., S.A.S., C.A.Y., S.M.K., B.N., D.M., W.L., R.S.L. and M.Q. cloned, expressed and purified the proteins. D.R. ran the RNAcompete assays, including data extraction. H.K. and K.B.C. processed the data, H.K. and K.B.C. generated motifs, and H.K., K.B.C., M.T.W. and H.S.N. performed the motif analyses. H.K. assembled the in vivo protein-RNA data sets. L.H.M. and R.K.D. performed and analysed RIP-seq data. K.B.C. developed the supplementary website and Figs 1 and 2 with assistance from H.K. and M.T.W. M.T.W. and M.A. created the cisBP-RNA database. M.T.W., H.S.N. and T.R.H. created Fig. 3. H.S.N. performed the analyses of human splicing, RNA stability data and human sequence conservation, and created Figs 4 and 5. M.I. and S.G. generated and analysed RNA-seq data and S.G. performed reporter-based RNA stability assays. X.L. performed Drosophila data analysis. H.D.L., F.P., A.H.C., R.P.C., B.J.F., R.A.A., K.W.L., L.O.F.P., E.P.L., B.J.B. and A.G.F. helped organize and support the project, and provided feedback on the manuscript. B.J.F., B.J.B. and A.G.F. provided critical advice and commentary on data analysis. Q.D.M. and T.R.H. conceived of the study, supervised the project and wrote the manuscript with contributions from D.R., H.K., K.B.C., B.J.B., A.F. and H.S.N.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Quaid D. Morris or Timothy R. Hughes.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Methods, Supplementary Figures 1-6, Supplementary Tables 1-4 and additional references.

Excel files

  1. 1.

    Supplementary Data 1

    This file shows RNA-binding proteins with known consensus motifs. It contains panels for human and Drosophila listing RBPs with known consensus motifs as well as the Pubmed ID of the publication that defined the motif.

  2. 2.

    Supplementary Data 2

    The RNAcompete master file. This file contains data on all RNAcompete experiments indexed by motif ID including: name, systematic ID and species of protein queried, the resulting motif, amino acid sequence of plasmid insert, and information on binding conditions used.

  3. 3.

    Supplementary Data 3

    Secondary structure analysis. This file contains data panels in which each row corresponds to a significantly enriched secondary structure context for a given RNAcompete experiment along with P-values and effect sizes. Classification panel summarizes analysis results by motif.

  4. 4.

    Supplementary Data 5

    Comparison of RNAcompete and literature motifs. This file shows the results of comparison with previously defined motifs for RNAcompete RBPs.

  5. 5.

    Supplementary Data 6

    AUROC scores for in vivo and in vitro defined motifs on in vivo binding data. This file contains AUROCs for RNAcompete motifs on in vivo binding data described in Table S2, along with motifs learned by Malarkey on these data and AUROC scores for previously defined motifs for these RBPs.

  6. 6.

    Supplementary Data 7

    Post-transcriptional regulation (PTR) analysis in human. This file contains additional details and results of PTR analysis in human including predicted RBP-transcript regulatory networks for splicing and stability analysis.

  7. 7.

    Supplementary Data 8

    Post-transcriptional regulation (PTR) analysis in Drosophila. This file contains details and results of PTR analysis for Drosophila including lists of PTR categories enriched for RNAcompete-derived IUPAC motifs, weights of trained logistic regression classifiers, Drosophila RBP(s) associated with each IUPAC motif, and IUPAC motifs queried.

  8. 8.

    Supplementary Data 9

    Sources of gene and Pfam models. This file details sources for gene and protein models for all organisms used in cisBP-RNA and in this paper. Also indicates Pfam models used to scan for RBDs.Sources of gene and Pfam models. This file details sources for gene and protein models for all organisms used in cisBP-RNA and in this paper. Also indicates Pfam models used to scan for RBDs.

Text files

  1. 1.

    Supplementary Data 4

    Clustered E-scores. This file contains the data matrix used in Figures 1b and S7.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.