A commonly asked question among researchers analysing genomic data is, given a set of microarray experiments where the activities of different cis-acting regulatory proteins vary, is it possible to predict the DNA protein-binding motifs upstream of the regulated genes? A paper just published in Microbiology describes an approach that addresses this issue.

In an era where huge genomic and functional genomic data sets are being generated on a daily basis, a challenge for biologists is to develop techniques that allow the extraction of useful information that can inform and guide further experimental investigation. In this study, Haluk Resat and colleagues focused on the search for DNA motifs present in the genome of the photosynthetic bacterium Rhodobacter sphaeroides that bind three transciption factors known to regulate photosynthetic gene expression — PrrA, PpsR and FnrL. The approach used by the authors was to first perform a hierarchical clustering of R. sphaeroides genes using microarray mRNA expression data to identify genes that showed similar expression patterns under different experimental conditions. Second, the DNA sequences upstream of these genes were analysed for signature sites that suggested possible co-regulation. These sites were then used to generate predicted consensus sequences that formed the basis of a whole-genome-level search to identify putative new target genes for these regulators.

The transition of Rhodobacter sphaeroides from an aerobic to anaerobic environment triggers regulatory events that result in the formation of photosynthetic membranes (indicated with arrows) under the regulatory control of PpsR, FnrL and PrrA. Image courtesy of C. Mackenzie, The University of Texas Health Science Center, USA.

As a validation of the approach, Mao et al. independently identified PpsR and FnrL binding sites that were consistent with previously published consensus sequences for these transcription factors. The authors also extended the number of possible target genes regulated by these proteins. Further analysis of the PrrA DNA-binding sequence indicated that it consists of two conserved elements with a variable-sized gap in between. Last, using the three consensus sequences, a whole genome analysis of the R. sphaeroides genome revealed that the PrrA regulon was considerably larger than that of PpsR and FnrL, providing evidence that PrrA is a global regulator for gene expression in this microorganism.

The authors note that, as with all prediction techniques, the generation of false-positive and false-negative results is possible; however, the technique is sufficiently robust to assist in the useful prediction of genes regulated by these transcription factors. The approach should also be applicable to additional gene clusters derived from microarray data, and facilitate the identification of regulatory elements crucial to other biological processes.