Article | Published:

Combinatorial binding predicts spatio-temporal cis-regulatory activity

Nature volume 462, pages 6570 (05 November 2009) | Download Citation

Abstract

Development requires the establishment of precise patterns of gene expression, which are primarily controlled by transcription factors binding to cis-regulatory modules. Although transcription factor occupancy can now be identified at genome-wide scales, decoding this regulatory landscape remains a daunting challenge. Here we used a novel approach to predict spatio-temporal cis-regulatory activity based only on in vivo transcription factor binding and enhancer activity data. We generated a high-resolution atlas of cis-regulatory modules describing their temporal and combinatorial occupancy during Drosophila mesoderm development. The binding profiles of cis-regulatory modules with characterized expression were used to train support vector machines to predict five spatio-temporal expression patterns. In vivo transgenic reporter assays demonstrate the high accuracy of these predictions and reveal an unanticipated plasticity in transcription factor binding leading to similar expression. This data-driven approach does not require previous knowledge of transcription factor sequence affinity, function or expression, making it widely applicable.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Data deposits

All ChIP data are available in ArrayExpress under accession numbers E-TABM-648, E-TABM-649, E-TABM-650, E-TABM-651 and E-TABM-652, and the array design under A-AFFY-53. The CRM coordinates and transcription factor occupancy is available at http://furlonglab.embl.de/.

References

  1. 1.

    & Gene regulatory networks for development. Proc. Natl Acad. Sci. USA 102, 4936–4942 (2005)

  2. 2.

    & Developmental mechanisms and cis-regulatory codes. Curr. Opin. Genet. Dev. 16, 165–170 (2006)

  3. 3.

    & Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005)

  4. 4.

    , & Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 11, 4047–4057 (1992)

  5. 5.

    , , , & Role of a conserved retinoic acid response element in rhombomere restriction of Hoxb-1. Science 265, 1728–1732 (1994)

  6. 6.

    , , & The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214 (1996)

  7. 7.

    et al. Ras pathway specificity is determined by the integration of multiple signal-activated and tissue-restricted transcription factors. Cell 103, 63–74 (2000)

  8. 8.

    , & Cis-regulatory logic in the endo16 gene: switching from a specification to a differentiation mode of control. Development 128, 617–629 (2001)

  9. 9.

    & Molecular integration of inductive and mesoderm-intrinsic inputs governs even-skipped enhancer activity in a subset of pericardial and dorsal muscle progenitors. Dev. Biol. 238, 13–26 (2001)

  10. 10.

    , & A regulatory gene network that directs micromere specification in the sea urchin embryo. Dev. Biol. 246, 209–228 (2002)

  11. 11.

    & Evolutionary origins of the vertebrate heart: Specification of the cardiac lineage in Ciona intestinalis. Proc. Natl Acad. Sci. USA 100, 11469–11473 (2003)

  12. 12.

    et al. Analysis of a key regulatory region upstream of the Myf5 gene reveals multiple phases of myogenesis, orchestrated at each site by a combination of elements dispersed throughout the locus. Development 130, 3415–3426 (2003)

  13. 13.

    & Nuclear integration of positive Dpp signals, antagonistic Wg inputs and mesodermal competence factors during Drosophila visceral mesoderm induction. Development 132, 1429–1442 (2005)

  14. 14.

    , , & Computational models for neurogenic gene expression in the Drosophila embryo. Curr. Biol. 16, 1358–1365 (2006)

  15. 15.

    , , & A combinatorial code of maternal GATA, Ets and β-catenin-TCF transcription factors specifies and patterns the early ascidian ectoderm. Development 134, 4023–4032 (2007)

  16. 16.

    et al. A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. Dev. Cell 10, 797–807 (2006)

  17. 17.

    et al. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390 (2007)

  18. 18.

    et al. A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 21, 436–449 (2007)

  19. 19.

    et al. Temporal ChIP-on-chip reveals Biniou as a universal regulator of the visceral muscle transcriptional network. Genes Dev. 21, 2448–2460 (2007)

  20. 20.

    et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008)

  21. 21.

    , , & A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog-mediated patterning of the mammalian limb. Genes Dev. 22, 2651–2663 (2008)

  22. 22.

    et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009)

  23. 23.

    The Regulatory Genome—Gene Regulatory Networks In Development and Evolution 2nd edn (Elsevier Publishers, 2006)

  24. 24.

    et al. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 10, R80 (2009)

  25. 25.

    et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005)

  26. 26.

    et al. Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene. Nature Genet. 38, 1159–1165 (2006)

  27. 27.

    , , , & Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008)

  28. 28.

    & twist: a myogenic switch in Drosophila. Science 272, 1481–1484 (1996)

  29. 29.

    , & Regulation of the Twist target gene tinman by modular cis-regulatory elements during early mesoderm development. Development 124, 4971–4982 (1997)

  30. 30.

    & tinman and bagpipe: two homeo box genes that determine cell fates in the dorsal mesoderm of Drosophila. Genes Dev. 7 (7B). 1325–1340 (1993)

  31. 31.

    et al. Drosophila MEF2, a transcription factor that is essential for myogenesis. Genes Dev. 9, 730–741 (1995)

  32. 32.

    , , , & D-MEF2: a MADS box transcription factor expressed in differentiating mesoderm and muscle cell lineages during Drosophila embryogenesis. Proc. Natl Acad. Sci. USA 91, 5662–5666 (1994)

  33. 33.

    , , & biniou (FoxF), a central component in a regulatory network controlling visceral mesoderm development and midgut morphogenesis in Drosophila. Genes Dev. 15, 2900–2915 (2001)

  34. 34.

    Integrating transcriptional and signalling networks during muscle development. Curr. Opin. Genet. Dev. 14, 343–350 (2004)

  35. 35.

    Muscle Development in Drosophila (Birkhäuser, 2006)

  36. 36.

    et al. A systematic analysis of Tinman function reveals Eya and JAK-STAT signaling as essential regulators of muscle development. Dev. Cell 16, 280–291 (2009)

  37. 37.

    & TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics 21, 3629–3636 (2005)

  38. 38.

    , , & Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007)

  39. 39.

    , & Model-based deconvolution of genome-wide DNA binding. Bioinformatics 24, 396–403 (2008)

  40. 40.

    et al. Genome-wide analysis of Polycomb targets in Drosophila melanogaster. Nature Genet. 38, 700–705 (2006)

  41. 41.

    et al. The myogenic regulatory gene Mef2 is a direct target for transcriptional activation by Twist during Drosophila myogenesis. Genes Dev. 12, 422–434 (1998)

  42. 42.

    , & Transcription of the myogenic regulatory gene Mef2 in cardiac, somatic, and visceral muscle cell lineages is regulated by a Tinman-dependent core enhancer. Dev. Biol. 215, 420–430 (1999)

  43. 43.

    , & Positive autoregulation of the Myocyte enhancer factor-2 myogenic control gene during somatic muscle development in Drosophila. Dev. Biol. 267, 536–547 (2004)

  44. 44.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

  45. 45.

    , & REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila. Nucleic Acids Res. 36 (Database issue). D594–D598 (2008)

  46. 46.

    , , , & An optimized transgenesis system for Drosophila using germ-line-specific ϕC31 integrases. Proc. Natl Acad. Sci. USA 104, 3312–3317 (2007)

  47. 47.

    , & Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007)

  48. 48.

    , , & VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 35 (Database issue). D88–D92 (2007)

  49. 49.

    et al. Zebrafish transgenic Enhancer TRAP line database (ZETRAP). BMC Dev. Biol. 6, 5 (2006)

  50. 50.

    , & ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos. Nature Protocols 1, 2839–2855 (2006)

  51. 51.

    et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3, RESEARCH0079 (2002)

  52. 52.

    et al. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 37 (Database issue). D555–D559 (2009)

  53. 53.

    , , & A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003)

  54. 54.

    et al. RSAT: regulatory sequence analysis tools. Nucleic Acids Res. 36 (Web Server issue) W119–W127 (2008)

  55. 55.

    , & GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics 23, 1188–1194 (2007)

  56. 56.

    & Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

  57. 57.

    Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J. Am. Stat. Assoc. 93, 1356–1364 (1998)

Download references

Acknowledgements

We are grateful to M. Leptin for providing an independent assessment of the expression patterns driven by tested CRMs. We thank H. Gustafson for fly work, J. de Graaf for array hybridizations, S. Müller for embryo injections, and R. Bourgon for sharing code on signal peak identification. We thank all members of the Furlong laboratory for discussions and comments on the manuscript. This work was supported by a grant to E.E.M.F. and by a fellowship to R.P.Z. from the Human Frontiers Science Program.

Author Contributions M.B. performed ChIP experiments. R.P.Z., E.E.M.F. and C.G. generated CAD. R.P.Z. performed transgenic reporter experiments including in situ hybridizations and imaging. C.G. performed ChIP data analysis and motif analysis. J.G. devised the statistical and SVM analyses. E.E.M.F., R.P.Z., C.G. and J.G. formulated the hypotheses, designed experiments and wrote the manuscript.

Author information

Author notes

    • Robert P. Zinzen
    • , Charles Girardot
    •  & Julien Gagneur

    These authors contributed equally to this work.

Affiliations

  1. European Molecular Biology Laboratory, D-69117 Heidelberg, Germany

    • Robert P. Zinzen
    • , Charles Girardot
    • , Julien Gagneur
    • , Martina Braun
    •  & Eileen E. M. Furlong

Authors

  1. Search for Robert P. Zinzen in:

  2. Search for Charles Girardot in:

  3. Search for Julien Gagneur in:

  4. Search for Martina Braun in:

  5. Search for Eileen E. M. Furlong in:

Corresponding author

Correspondence to Eileen E. M. Furlong.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Methods, Supplementary Tables 1- 3 and 12, (for Supplementary Tables 4 -11 see separate files s2-s9), Supplementary Figures 1-15 with Legends and Supplementary References.

Text files

  1. 1.

    Supplementary Table 4

    This file contains CAD entries in tabular text format. Source, name and coordinates of each CAD entry is given with anatomy ontology terms and cross references to Flybase, PubMed, REDfly and FLDb (Furlong Db). A 'NR' key next to the source name indicates that entries have been modified during the CAD building process; in these cases, references to original entries are available in the cross references (using REDFly and FLDb references). File also contains embedded formatting comments. The associated CAD archive contains the various CAD input files as well as CAD in GFF format.

  2. 2.

    Supplementary Table 5

    This file contains CRM Atlas in tabular format. The file provides ID, location and binding events for each CRM Atlas entry.

  3. 3.

    Supplementary Table 6

    This file contains Regions reported by TileMap before cut-off selection.

  4. 4.

    Supplementary Table 7

    This file contains TileMap regions used to build the CRM Atlas; together with peak position and height.

  5. 5.

    Supplementary Table 8

    This file contains Training set for the Support Vector Machine.

  6. 6.

    Supplementary Table 9

    This file contains Support Vector Machine predictions.

  7. 7.

    Supplementary Table 10

    This file contains Initial and Optimized Position Weight Matrices.

  8. 8.

    Supplementary Table 11

    This file contains Cloned CRMs.

Zip files

  1. 1.

    Supplementary Data

    This contains Supplementary File 1, which was added on 25 Mar 2010.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature08531

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.