An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments

Liu, X. Shirley; Brutlag, Douglas L.; Liu, Jun S.

doi:10.1038/nbt717

Technical Report
Published: 08 July 2002

An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments

X. Shirley Liu¹,
Douglas L. Brutlag² &
Jun S. Liu³

Nature Biotechnology volume 20, pages 835–839 (2002)Cite this article

2579 Accesses
482 Citations
6 Altmetric
Metrics details

Abstract

Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP–array) has become a popular procedure for studying genome-wide protein–DNA interactions and transcription regulation. However, it can only map the probable protein–DNA interaction loci within 1–2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP–array-selected sequences and searches for DNA sequence motifs representing the protein–DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration^1,2,3,4 and position-specific weight matrix updating^5,6,7,8,9, and incorporates the ChIP–array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP–array experiments in yeast^10,11,12,13 (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms^5,8,9. MDscan can be used to find DNA motifs not only in ChIP–array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Multi-contact 4C: long-molecule sequencing of complex proximity ligation products to uncover local cooperative and competitive chromatin topologies

Article 13 January 2020

Carlo Vermeulen, Amin Allahyar, … Wouter de Laat

Transcriptome-wide identification of RNA-binding protein binding sites using seCLIP-seq

Article 23 March 2022

Steven M. Blue, Brian A. Yee, … Gene W. Yeo

Occupancy maps of 208 chromatin-associated proteins in one human cell type

Article Open access 29 July 2020

E. Christopher Partridge, Surya B. Chhetri, … Eric M. Mendenhall

References

van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
Article CAS Google Scholar
Bussemaker, H.J., Li, H. & Siggia, E.D. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. USA 97, 10096–10100 (2000).
Article CAS Google Scholar
Sinha, S. & Tompa, M. A statistical method for finding transcription factor binding sites. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 344–354 (2000).
CAS PubMed Google Scholar
Vilo, J., Brazma, A., Jonassen, I., Robinson, A. & Ukkonen, E. Mining for putative regulatory elements in the yeast genome using gene expression data. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 384–394 (2000).
CAS PubMed Google Scholar
Hertz, G.Z., Hartzell, G.W. & Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 6, 81–92 (1990).
CAS PubMed Google Scholar
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
CAS PubMed Google Scholar
Liu, J.S., Neuwald, A.F. & Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156–1170 (1995).
Article Google Scholar
Roth, F.P., Hughes, J.D., Estep, P.W. & Church, G.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939–945 (1998).
Article CAS Google Scholar
Liu, X., Brutlag, D.L. & Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 127–138 (2001).
Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
Article CAS Google Scholar
Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).
Article CAS Google Scholar
Lieb, J.D., Liu, X., Botstein, D. & Brown, P.O. Promoter-specific binding of Rap1p revealed by genome-wide maps of protein-DNA association. Nat. Genet. 28, 327–334 (2001).
Article CAS Google Scholar
Simon, I. et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106, 697–708 (2001).
Article CAS Google Scholar
Dolan, J.W., Kirkman, C. & Fields, S. The yeast STE12 protein binds to the DNA sequence mediating pheromone induction. Proc. Natl. Acad. Sci. USA 86, 5703–5707 (1989).
Article CAS Google Scholar
Graham, I.R. & Chambers, A. Use of a selection technique to identify the diversity of binding sites for the yeast RAP1 transcription factor. Nucleic Acids Res. 22, 124–130 (1994).
Article CAS Google Scholar
Buchman, A.R., Kimmerly, W.J., Rine, J. & Kornberg, R.D. Two DNA-binding factors recognize specific sequences at silencers, upstream activating sequences, autonomously replicating sequences, and telomeres in Saccharomyces cerevisiae. Mol. Cell Biol. 8, 210–225 (1988).
Article CAS Google Scholar
Idrissi, F.Z. & Pina, B. Functional divergence between the half-sites of the DNA-binding sequence for the yeast transcriptional regulator Rap1p. Biochem. J. 341, 477–482 (1999).
Article CAS Google Scholar
Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Article CAS Google Scholar
Lawrence, C.E. et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993).
Article CAS Google Scholar
Liu, J.S. Monte Carlo Strategies in Scientific Computing (Springer, New York, 2001).
Google Scholar

Download references

Acknowledgements

The authors thank the Brown lab at Stanford (especially Jason D. Lieb) and the Young lab at MIT (especially Bing Ren) for their valuable data and scientific insight. This work is supported by National Human Genome Research Institute grants R01 HGF02235 and R01 HG02518-01, and National Science Foundation grant DMS-0094613.

Author information

Authors and Affiliations

Stanford Medical Informatics, Stanford University, Stanford, 94305, CA
X. Shirley Liu
Department of Biochemistry, Stanford University, Stanford, 94305, CA
Douglas L. Brutlag
Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, 02138, MA
Jun S. Liu

Authors

X. Shirley Liu
View author publications
You can also search for this author in PubMed Google Scholar
Douglas L. Brutlag
View author publications
You can also search for this author in PubMed Google Scholar
Jun S. Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun S. Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Brutlag, D. & Liu, J. An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments. Nat Biotechnol 20, 835–839 (2002). https://doi.org/10.1038/nbt717

Download citation

Received: 21 August 2001
Accepted: 10 May 2002
Published: 08 July 2002
Issue Date: 01 August 2002
DOI: https://doi.org/10.1038/nbt717

This article is cited by

MicrosatNavigator: exploring nonrandom distribution and lineage-specificity of microsatellite repeat motifs on vertebrate sex chromosomes across 186 whole genomes
- Ryan Rasoarahona
- Pish Wattanadilokchatkun
- Kornsorn Srikulnath
Chromosome Research (2023)
Curation, inference, and assessment of a globally reconstructed gene regulatory network for Streptomyces coelicolor
- Andrea Zorro-Aranda
- Juan Miguel Escorcia-Rodríguez
- Julio Augusto Freyre-González
Scientific Reports (2022)
Genome-scale exploration of transcriptional regulation in the nisin Z producer Lactococcus lactis subsp. lactis IO-1
- Naghmeh Poorinmohammad
- Javad Hamedi
- Ali Masoudi-Nejad
Scientific Reports (2020)
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes
- Bingqiang Liu
- Hanyuan Zhang
- Qin Ma
BMC Genomics (2016)
UpCoT: an integrated pipeline tool for clustering upstream DNA sequences of orthologous genes in prokaryotic genomes
- P. V. Parvati Sai Arun
- Jogadhenu S. S. Prakash
3 Biotech (2016)

An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments

Abstract

Access options

Similar content being viewed by others

Multi-contact 4C: long-molecule sequencing of complex proximity ligation products to uncover local cooperative and competitive chromatin topologies

Transcriptome-wide identification of RNA-binding protein binding sites using seCLIP-seq

Occupancy maps of 208 chromatin-associated proteins in one human cell type

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

MicrosatNavigator: exploring nonrandom distribution and lineage-specificity of microsatellite repeat motifs on vertebrate sex chromosomes across 186 whole genomes

Curation, inference, and assessment of a globally reconstructed gene regulatory network for Streptomyces coelicolor

Genome-scale exploration of transcriptional regulation in the nisin Z producer Lactococcus lactis subsp. lactis IO-1

An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes

UpCoT: an integrated pipeline tool for clustering upstream DNA sequences of orthologous genes in prokaryotic genomes

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links