We developed an algorithm, Lever, that systematically maps metazoan DNA regulatory motifs or motif combinations to sets of genes. Lever assesses whether the motifs are enriched in cis-regulatory modules (CRMs), predicted by our PhylCRM algorithm, in the noncoding sequences surrounding the genes. Lever analysis allows unbiased inference of functional annotations to regulatory motifs and candidate CRMs. We used human myogenic differentiation as a model system to statistically assess greater than 25,000 pairings of gene sets and motifs or motif combinations. We assigned functional annotations to candidate regulatory motifs predicted previously and identified gene sets that are likely to be co-regulated via shared regulatory motifs. Lever allows moving beyond the identification of putative regulatory motifs in mammalian genomes, toward understanding their biological roles. This approach is general and can be applied readily to any cell type, gene expression pattern or organism of interest.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Bulyk, M.L. Computational prediction of transcription-factor binding site locations. Genome Biol. 5, 201 (2003).
Blanchette, M. et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16, 656–668 (2006).
Hallikas, O. et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006).
Pennacchio, L.A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).
Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S. & Lawrence, C.E. Decoding human regulatory circuits. Genome Res. 14, 1967–1974 (2004).
Zhou, Q. & Wong, W.H. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101, 12114–12119 (2004).
Wasserman, W.W. & Fickett, J. Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278, 167–181 (1998).
Philippakis, A.A., He, F.S. & Bulyk, M.L. Modulefinder: a tool for computational discovery of cis regulatory modules. Pac. Symp. Biocomput. 10, 519–530 (2005).
Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).
Elemento, O. & Tavazoie, S. Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 6, R18 (2005).
Huber, B.R. & Bulyk, M.L. Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data. BMC Bioinformatics 7, 229 (2006).
Ettwiller, L. et al. The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol. 6, R104 (2005).
Bulyk, M.L. DNA microarray technologies for measuring protein-DNA interactions. Curr. Opin. Biotechnol. 17, 422–430 (2006).
Bulyk, M.L., Huang, X., Choo, Y. & Church, G.M. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl. Acad. Sci. USA 98, 7158–7163 (2001).
Mukherjee, S. et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 36, 1331–1339 (2004).
Berger, M.F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).
Philippakis, A.A. et al. Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells. PLOS Comput. Biol. 2, e53 (2006).
Moses, A.M., Chiang, D.Y., Pollard, D.A., Iyer, V.N. & Eisen, M.B. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol. 5, R98 (2004).
Margulies, E.H. et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17, 760–774 (2007).
Messenguy, F. & Dubois, E. Role of MADS box proteins and their cofactors in combinatorial control of gene expression and cell development. Gene 316, 1–21 (2003).
Blais, A. et al. An initial blueprint for myogenic differentiation. Genes Dev. 19, 553–569 (2005).
Daury, L. et al. Opposing functions of ATF2 and Fos-like transcription factors in c-Jun-mediated myogenin expression and terminal differentiation of avian myoblasts. Oncogene 20, 7998–8008 (2001).
Wang, Z. et al. Myocardin and ternary complex factors compete for SRF to control smooth muscle gene expression. Nature 428, 185–189 (2004).
Martinez-Fernandez, S. et al. Pitx2c overexpression promotes cell proliferation and arrests differentiation in myoblasts. Dev. Dyn. 235, 2930–2939 (2006).
Gurtner, A. et al. Requirement for down-regulation of the CCAAT-binding activity of the NF-Y transcription factor during skeletal muscle differentiation. Mol. Biol. Cell 14, 2706–2715 (2003).
Ludwig, M.Z., Bergman, C., Patel, N.H. & Kreitman, M. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567 (2000).
Wasserman, W.W., Palumbo, M., Thompson, W., Fickett, J. & Lawrence, C. Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26, 225–228 (2000).
Kasabov, N.K. Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering (MIT Press, Cambridge, Massachusetts, 1998).
Mootha, V.K. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Berriz, G.F., King, O.D., Bryant, B., Sander, C. & Roth, F.P. Characterizing gene sets with FuncAssociate. Bioinformatics 19, 2502–2504 (2003).
We thank E. Margulies and the ENCODE Multiple Sequence Alignment working group for generously allowing use of their phylogenetic tree before its publication; S. Asthana, S. Sunyaev, G. Kryukov, M. Berger, T. Siggers and A. Aboukhalil for helpful discussions; J. Chee, E. Mathewson and T. Sierra for technical assistance; S. Elledge, A. Friedman, T. Siggers, M. Berger and F. De Masi for critical reading of the manuscript; A. Donner (Brigham & Women's Hospital) for the generous gift of human lens epithelial cells; and K. Cichowski (Brigham & Women's Hospital) for kindly providing lentiviral reagents. This work was funded in part by a PhRMA Foundation Informatics Research Starter Grant (M.L.B.), a William F. Milton Fund Award (M.L.B.), a Harvard-MIT Division of Health Sciences & Technology (HST) Taplin Award (M.L.B.) and US National Institutes of Health (NIH) National Human Genome Research Institute (R01 HG002966 to M.L.B.). J.B.W. was supported in part by an NIH Training Grant T32 HL07627 and NIH Individual National Research Service Award F32 AR051287. A.A.P. was supported in part by a National Defense Science and Engineering Graduate Fellowship from the Department of Defense and an Athinoula Martinos Fellowship from HST. S.A.J. was supported in part by a US National Science Foundation Postdoctoral Research Fellowship in Biological Informatics.
About this article
Computational and Structural Biotechnology Journal (2019)
BMC Genomics (2018)
Briefings in Bioinformatics (2018)
Scientific Reports (2018)
Genome Biology and Evolution (2017)