Systematic identification of mammalian regulatory motifs' target genes and functions

Article metrics


We developed an algorithm, Lever, that systematically maps metazoan DNA regulatory motifs or motif combinations to sets of genes. Lever assesses whether the motifs are enriched in cis-regulatory modules (CRMs), predicted by our PhylCRM algorithm, in the noncoding sequences surrounding the genes. Lever analysis allows unbiased inference of functional annotations to regulatory motifs and candidate CRMs. We used human myogenic differentiation as a model system to statistically assess greater than 25,000 pairings of gene sets and motifs or motif combinations. We assigned functional annotations to candidate regulatory motifs predicted previously and identified gene sets that are likely to be co-regulated via shared regulatory motifs. Lever allows moving beyond the identification of putative regulatory motifs in mammalian genomes, toward understanding their biological roles. This approach is general and can be applied readily to any cell type, gene expression pattern or organism of interest.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Lever schema.
Figure 2: Analysis of the time course of human skeletal muscle differentiation.
Figure 3: Lever screen of 101 myogenic gene sets using a dictionary of 174 motifs.
Figure 4: Experimental validation of computationally predicted CRMs.

Accession codes


Gene Expression Omnibus


  1. 1

    Bulyk, M.L. Computational prediction of transcription-factor binding site locations. Genome Biol. 5, 201 (2003).

  2. 2

    Blanchette, M. et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16, 656–668 (2006).

  3. 3

    Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006).

  4. 4

    Pennacchio, L.A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).

  5. 5

    Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S. & Lawrence, C.E. Decoding human regulatory circuits. Genome Res. 14, 1967–1974 (2004).

  6. 6

    Zhou, Q. & Wong, W.H. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101, 12114–12119 (2004).

  7. 7

    Wasserman, W.W. & Fickett, J. Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278, 167–181 (1998).

  8. 8

    Philippakis, A.A., He, F.S. & Bulyk, M.L. Modulefinder: a tool for computational discovery of cis regulatory modules. Pac. Symp. Biocomput. 10, 519–530 (2005).

  9. 9

    Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

  10. 10

    Elemento, O. & Tavazoie, S. Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 6, R18 (2005).

  11. 11

    Huber, B.R. & Bulyk, M.L. Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data. BMC Bioinformatics 7, 229 (2006).

  12. 12

    Ettwiller, L. et al. The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol. 6, R104 (2005).

  13. 13

    Bulyk, M.L. DNA microarray technologies for measuring protein-DNA interactions. Curr. Opin. Biotechnol. 17, 422–430 (2006).

  14. 14

    Bulyk, M.L., Huang, X., Choo, Y. & Church, G.M. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl. Acad. Sci. USA 98, 7158–7163 (2001).

  15. 15

    Mukherjee, S. et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 36, 1331–1339 (2004).

  16. 16

    Berger, M.F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).

  17. 17

    Philippakis, A.A. et al. Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells. PLOS Comput. Biol. 2, e53 (2006).

  18. 18

    Moses, A.M., Chiang, D.Y., Pollard, D.A., Iyer, V.N. & Eisen, M.B. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol. 5, R98 (2004).

  19. 19

    Margulies, E.H. et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17, 760–774 (2007).

  20. 20

    Messenguy, F. & Dubois, E. Role of MADS box proteins and their cofactors in combinatorial control of gene expression and cell development. Gene 316, 1–21 (2003).

  21. 21

    Blais, A. et al. An initial blueprint for myogenic differentiation. Genes Dev. 19, 553–569 (2005).

  22. 22

    Daury, L. et al. Opposing functions of ATF2 and Fos-like transcription factors in c-Jun-mediated myogenin expression and terminal differentiation of avian myoblasts. Oncogene 20, 7998–8008 (2001).

  23. 23

    Wang, Z. et al. Myocardin and ternary complex factors compete for SRF to control smooth muscle gene expression. Nature 428, 185–189 (2004).

  24. 24

    Martinez-Fernandez, S. et al. Pitx2c overexpression promotes cell proliferation and arrests differentiation in myoblasts. Dev. Dyn. 235, 2930–2939 (2006).

  25. 25

    Gurtner, A. et al. Requirement for down-regulation of the CCAAT-binding activity of the NF-Y transcription factor during skeletal muscle differentiation. Mol. Biol. Cell 14, 2706–2715 (2003).

  26. 26

    Ludwig, M.Z., Bergman, C., Patel, N.H. & Kreitman, M. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567 (2000).

  27. 27

    Wasserman, W.W., Palumbo, M., Thompson, W., Fickett, J. & Lawrence, C. Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26, 225–228 (2000).

  28. 28

    Kasabov, N.K. Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering (MIT Press, Cambridge, Massachusetts, 1998).

  29. 29

    Mootha, V.K. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).

  30. 30

    Berriz, G.F., King, O.D., Bryant, B., Sander, C. & Roth, F.P. Characterizing gene sets with FuncAssociate. Bioinformatics 19, 2502–2504 (2003).

Download references


We thank E. Margulies and the ENCODE Multiple Sequence Alignment working group for generously allowing use of their phylogenetic tree before its publication; S. Asthana, S. Sunyaev, G. Kryukov, M. Berger, T. Siggers and A. Aboukhalil for helpful discussions; J. Chee, E. Mathewson and T. Sierra for technical assistance; S. Elledge, A. Friedman, T. Siggers, M. Berger and F. De Masi for critical reading of the manuscript; A. Donner (Brigham & Women's Hospital) for the generous gift of human lens epithelial cells; and K. Cichowski (Brigham & Women's Hospital) for kindly providing lentiviral reagents. This work was funded in part by a PhRMA Foundation Informatics Research Starter Grant (M.L.B.), a William F. Milton Fund Award (M.L.B.), a Harvard-MIT Division of Health Sciences & Technology (HST) Taplin Award (M.L.B.) and US National Institutes of Health (NIH) National Human Genome Research Institute (R01 HG002966 to M.L.B.). J.B.W. was supported in part by an NIH Training Grant T32 HL07627 and NIH Individual National Research Service Award F32 AR051287. A.A.P. was supported in part by a National Defense Science and Engineering Graduate Fellowship from the Department of Defense and an Athinoula Martinos Fellowship from HST. S.A.J. was supported in part by a US National Science Foundation Postdoctoral Research Fellowship in Biological Informatics.

Author information

J.B.W. participated in the experimental design, performed the experiments and participated in analysis of the results and drafting of the manuscript. A.A.P. conceived of the PhylCRM scoring algorithm, participated in programming PhylCRM and running PhylCRM analyses, the development of Lever, programming Lever, running Lever analyses and analyzing the results and drafting of the manuscript. S.A.J. optimized the performance and participated in programming PhylCRM, running PhylCRM analyses, development of Lever, programming Lever and running Lever analyses and in analysis of the results and drafting of the manuscript. F.S.H. assisted with programming PhylCRM and running PhylCRM analyses. J.L. assisted with the experiments. M.L.B. conceived of the study and participated in the study design, analysis of the results and drafting of the manuscript.

Correspondence to Martha L Bulyk.

Supplementary information

Supplementary Text and Figures

Supplementary figures 1–12, Supplementary Tables 1, 2, 4, Supplementary Methods, Supplementary Results (PDF 7182 kb)

Supplementary Table 3

Statistically significant GM pairs from Lever analyses. (XLS 3676 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading