Journal home
Advance online publication
Current issue
Archive
Press releases
Supplements
Focuses
Conferences
Guide to authors
Online submissionOnline submission
Permissions
For referees
Free online issue
Contact the journal
Subscribe
Advertising
work@npg
naturereprints
About this site
For librarians
 
NPG Resources
Bioentrepreneur
Nature Reviews Drug Discovery
Nature
Nature Medicine
Nature Genetics
Nature Reviews Genetics
Nature Methods
Nature Chemical Biology
news@nature.com
Clinical Pharmacology & Therapeutics
Nature Conferences
NPG Subject areas
Biotechnology
Cancer
Chemistry
Clinical Medicine
Dentistry
Development
Drug Discovery
Earth Sciences
Evolution & Ecology
Genetics
Immunology
Materials Science
Medical Research
Microbiology
Molecular Cell Biology
Neuroscience
Pharmacology
Physics
Browse all publications
Technical Report
Nature Biotechnology  21, 435 - 439 (2003)
Published online: 10 March 2003; | doi:10.1038/nbt802

Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites

Zhaohui S. Qin1, Lee Ann McCue2, William Thompson2, Linda Mayerhofer2, Charles E. Lawrence2, 3 & Jun S. Liu1

1  Department of Statistics, Harvard University, Cambridge, MA 02138.

2  The Wadsworth Center, New York State Department of Health, Albany, NY 12201.

3  Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180.

Correspondence should be addressed to Jun S. Liu jliu@stat.harvard.edu
The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting1, 2. The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species1, 2. The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based3 Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods4. The application of BMC to hundreds of predicted bold gamma-proteobacterial motifs2 correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.

 Top
Abstract
Previous | Next
Table of contents
Full textFull text
Download PDFDownload PDF
Send to a friendSend to a friend
Save this linkSave this link

Open Innovation Challenges

naturejobs

Figures & Tables
Supplementary info
Export citation
natureproducts

Search buyers guide:

 
ADVERTISEMENT
 
Nature Biotechnology
ISSN: 1087-0156
EISSN: 1546-1696
Journal home | Advance online publication | Current issue | Archive | Press releases | Supplements | Focuses | Conferences | For authors | Online submission | Permissions | For referees | Free online issue | About the journal | Contact the journal | Subscribe | Advertising | work@npg | naturereprints | About this site | For librarians
Nature Publishing Group, publisher of Nature, and other science journals and reference works©2003 Nature Publishing Group | Privacy policy