Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites

Abstract

The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting1,2. The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species1,2. The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based3 Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods4. The application of BMC to hundreds of predicted γ-proteobacterial motifs2 correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The procedure for simulating motif data.
Figure 2: MetJ motif alignment.
Figure 3: Motif patterns for the clusters reported in Table 2.

Similar content being viewed by others

References

  1. McCue, L.A. et al. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 29, 774–782 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. McCue, L.A., Thompson, W., Carmack, C.S. & Lawrence, C.E. Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 12, 1523–1532 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Liu, J.S. Monte Carlo Strategies in Scientific Computing (Springer, New York, 2001).

  4. Everitt, B.S., Landau, S. & Leese, M. Cluster Analysis, edn. 4 (Arnold, London, 2001).

  5. Pietrokovski, S. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res. 24, 3836–3845 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hughes, J.D., Estep, P.W., Tarazoie, S. & Church, G.M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).

    Article  CAS  PubMed  Google Scholar 

  7. van Nimwegen, E., Zavolan, M., Rajewsky, N. & Siggia, E.D. Probabilistic clustering of sequences: inferring new bacterial regulons by comparative genomics. Proc. Nat. Acad. Sci. USA. 99, 7323–7328 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Robison, K., McGuire, A.M. & Church, G.M. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J. Mol. Biol. 284, 241–254 (1998).

    Article  CAS  PubMed  Google Scholar 

  9. Courcelle, J., Khodursky, A., Peter, B., Brown, P.O. & Hanawalt, P.C. Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia coli. Genetics 158, 41–64 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Hantke, K. Iron and metal regulation in bacteria. Curr. Opin. Microbiol. 4, 172–177 (2001).

    Article  CAS  PubMed  Google Scholar 

  11. Vassinova, N. & Kozyrev, D. A method for direct cloning of Fur-regulated genes: identification of seven new Fur-regulated loci in Escherichia coli. Microbiology 146, 3171–3182 (2000).

    Article  CAS  PubMed  Google Scholar 

  12. Escolar, L., Perez-Martin, J. & de Lorenzo, V. Opening the iron box: transcriptional metalloregulation by the Fur protein. J. Bacteriology 181, 6223–6229 (1999).

    CAS  Google Scholar 

  13. Jordan, A. & Reichard, P. Ribonucleotide reductases. Annu. Rev. Biochem. 67, 71–98 (1998).

    Article  CAS  PubMed  Google Scholar 

  14. Han, J.S., Kwon, H.S., Yim, J.-B. & Hwang, D.S. Effect of IciA protein on the expression of the nrd gene encoding ribonucleoside diphosphate reductase in E. coli. Mol. Gen. Genet. 259, 610–614 (1998).

    Article  CAS  PubMed  Google Scholar 

  15. van den Berg, E.A., Geerse, R.H., Memelink, J., Bovenberg, R.A., Magnee, F.A. & van der Putte, P. Analysis of regulatory sequences upstream of the E. coli uvrB gene; involvement of the DnaA protein. Nucleic Acids Res. 13, 1829–1840 (1985).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Neidhardt, F.C. Escherichia coli and Salmonella: Cellular and Molecular Biology (ASM Press, Washington, DC, 1996).

  17. Liu, J.S., Neuwald, A.F. & Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Stat. Assoc. 90, 1156–1170 (1995).

    Article  Google Scholar 

  18. Azam, T.A. & Ishihama, A. Twelve species of the nucleoid-associated protein from Escherichia coli. Sequence recognition specificity and DNA binding affinity. J. Biol. Chem. 274, 33105–33113 (1999).

    Article  CAS  PubMed  Google Scholar 

  19. Jeon, Y., Lee, Y.S., Han, J.S., Kim, J.B. & Hwang, D.S. Multimerization of phosphorylated and non-phosphorylated ArcA is necessary for the response regulator function of the Arc two-component signal transduction system. J. Biol. Chem. 276, 40873–40879 (2001).

    Article  CAS  PubMed  Google Scholar 

  20. Schneider, T.D. & Stephens, R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This research was partly supported by US National Institutes of Health (NIH) grant R01HG02518-01 and National Science Foundation grants DMS-0104129 and DMS-0204674 to J.S.L., NIH grant R01HG01257 to C.E.L., and Department of Energy grant DEFG0201ER63204 to C.E.L. and L.A.M.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun S. Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, Z., McCue, L., Thompson, W. et al. Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat Biotechnol 21, 435–439 (2003). https://doi.org/10.1038/nbt802

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt802

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing