Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Primer
  • Published:

How does DNA sequence motif discovery work?

How can we computationally extract an unknown motif from a set of target sequences? What are the principles behind the major motif discovery algorithms? Which of these should we use, and how do we know we've found a 'real' motif?

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Starting from a single site, expectation maximization algorithms such as MEME4 alternate between assigning sites to a motif (left) and updating the motif model (right).

References

  1. D'haeseleer. P. What are DNA sequence motifs? Nat. Biotechnol. 24, 423–425 (2006).

    Article  CAS  Google Scholar 

  2. Sinha, S. & Tompa, M. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003).

    Article  CAS  Google Scholar 

  3. Pavesi, G. et al. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32 (Web Server Issue), W199–W203 (2004).

    Article  CAS  Google Scholar 

  4. Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  PubMed  Google Scholar 

  5. Tompa, M. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005).

    Article  CAS  Google Scholar 

  6. Li, N. & Tompa, M. Analysis of computational approaches for motif discovery. Alg. Mol. Biol. 1, 8 (2006).

    Article  Google Scholar 

  7. Hu, J., Li, B. & Kihara, D. Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005).

    Article  CAS  Google Scholar 

  8. Thijs, G. et al. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comp. Biol. 9, 447–464 (2002).

    Article  CAS  Google Scholar 

  9. Huber, B.R. & Bulyk, M.L. Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data. BMC Bioinformatics 7, 229 (2006).

    Article  Google Scholar 

  10. Hughes, J.D. et al. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).

    Article  CAS  Google Scholar 

  11. McGuire, A.M., Hughes, J.D. & Church, G.M. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000).

    Article  CAS  Google Scholar 

  12. Huang, H.-D. et al. Identifying transcriptional regulatory sites in the human genome using an integrated system. Nucleic Acids Res. 32, 1948–1956 (2004).

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

D'haeseleer, P. How does DNA sequence motif discovery work?. Nat Biotechnol 24, 959–961 (2006). https://doi.org/10.1038/nbt0806-959

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt0806-959

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing