Chromatin immunoprecipitation followed by cDNA microarray
hybridization (ChIP−array) has become a popular procedure for studying
genome-wide protein−DNA interactions and transcription regulation.
However, it can only map the probable protein−DNA interaction loci within
1−2 kilobases resolution. To pinpoint interaction sites down to the
base-pair level, we introduce a computational method, Motif Discovery scan
(MDscan), that examines the ChIP−array-selected sequences and searches
for DNA sequence motifs representing the protein−DNA interaction sites.
MDscan combines the advantages of two widely adopted motif search strategies,
word enumeration1,
2,
3,
4 and position-specific weight matrix
updating5,
6,
7,
8,
9, and incorporates the ChIP−array
ranking information to accelerate searches and enhance their success rates.
MDscan correctly identified all the experimentally verified motifs from
published ChIP−array experiments in yeast10,
11,
12,
13
(STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif
patterns for the differential binding of Rap1 protein in telomere regions. In
our studies, the method was faster and more accurate than several established
motif-finding algorithms5,
8,
9. MDscan can be used to find DNA
motifs not only in ChIP−array experiments but also in other experiments
in which a subgroup of the sequences can be inferred to contain relatively
abundant motif sites. The MDscan web server can be accessed at
http://BioProspector.stanford.edu/MDscan/.