Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model

Journal name:
Nature Genetics
Volume:
45,
Pages:
1021–1028
Year published:
DOI:
doi:10.1038/ng.2713
Received
Accepted
Published online

Abstract

Despite continual progress in the cataloging of vertebrate regulatory elements, little is known about their organization and regulatory architecture. Here we describe a massively parallel experiment to systematically test the impact of copy number, spacing, combination and order of transcription factor binding sites on gene expression. A complex library of ~5,000 synthetic regulatory elements containing patterns from 12 liver-specific transcription factor binding sites was assayed in mice and in HepG2 cells. We find that certain transcription factors act as direct drivers of gene expression in homotypic clusters of binding sites, independent of spacing between sites, whereas others function only synergistically. Heterotypic enhancers are stronger than their homotypic analogs and favor specific transcription factor binding site combinations, mimicking putative native enhancers. Exhaustive testing of binding site permutations suggests that there is flexibility in binding site order. Our findings provide quantitative support for a flexible model of regulatory element activity and suggest a framework for the design of synthetic tissue-specific enhancers.

At a glance

Figures

  1. Synthetic enhancer sequence design and controls.
    Figure 1: Synthetic enhancer sequence design and controls.

    (a) SRESs consist of patterns of 12 consensus binding sequences arranged homotypically (class I) or heterotypically (class II and class III) on 1 of 2 neutral, 168-bp templates. (b) Schematic of massively parallel reporter assay methodology. SRESs were cloned upstream of a minimal promoter in a tagged luciferase library and then assayed in vivo using hydrodynamic tail vein injection. Livers were dissected 24 h after injection, mRNA was generated, and tags were reverse transcribed and sequenced. (c) Bimodal distribution of expression values for 4,966 SRESs. Expression values were calculated using the equation shown. (d) Template-template correlation. Expression values for 2,217 pairs of SRESs (not all SRESs had data for both templates owing to quality control measures) containing the exact same patterns of consensus binding sequences on 2 separate templates are plotted. The red line is a linear regression trace, whereas the dashed line is the diagonal. Template 1, hg19 chr. 9: 83,712,599–83,712,766; template 2, hg19 chr. 2: 211,153,238–211,153,405. (e) Expression values from the three mice in which the SRES library was tested, exhibiting a very high level of correlation.

  2. Homotypic amplification of expression is compatible with a subset of transcription factor binding sites, independent of their spacing.
    Figure 2: Homotypic amplification of expression is compatible with a subset of transcription factor binding sites, independent of their spacing.

    (a) We observed significant correlation between expression and the size of the homotypic cluster for 5 of the 12 transcription factor binding sites (CEBPA, FOXA1, HNF1A, ONECUT1 and XBP1). The PPARA binding site is included as an example of a site that could not be homotypically amplified. Included on the right are box plots for the background expression of all SRESs with a single binding site (B), as well as for the positive (+) and negative (−) controls. Red boxes denote groups of SRESs with significantly higher expression compared to background (Wilcoxon rank-sum test P ≤ 0.05), which is a slightly more stringent test than comparison against negative controls. P values refer to Spearman's correlation coefficients (corrected for multiple testing using FDR). In the box plots, the central rectangle spans the first and third quartiles, the line inside the rectangle is the median, and the lines beyond the box indicate the locations of the minimum and maximum values. (b) In the vast majority of cases, the strength of expression was not dependent on the distance between binding sites, as observed for class I elements. Shown are examples of SRESs, each with two copies of one of the three strongest transcription factor binding sites, including sites for CEBPA, HNF1A and XBP1. P values refer to Spearman's correlation coefficients, and the dashed gray lines are the regression traces.

  3. Heterotypic elements drive stronger expression than homotypic ones.
    Figure 3: Heterotypic elements drive stronger expression than homotypic ones.

    (a) Density of expression by SRES class. The density of expression is plotted for each class of SRES. Red dashed lines denote the mean expression value for each class. (b) Box plot of expression by class and number of sites. Note that there are no class II SRESs with 1 site or class III SRESs with <3 sites. In the box plots, the central rectangle spans the first and third quartiles, the line inside the rectangle is the median, and the lines beyond the box indicate the locations of the minimum and maximum values.

  4. Combinatorial effects in heterotypic clusters of two different transcription factor binding sites.
    Figure 4: Combinatorial effects in heterotypic clusters of two different transcription factor binding sites.

    (a) Significant interactions that were identified from a general model of class I and class II SRES–driven expression using the χ2 goodness-of-fit test or direct comparisons between class I and class II data sets. Dotted gray lines refer to the 19 combinations (of a possible 121) that were sampled in class II SRESs. Red lines indicate significant synergy, and a black line indicates significant interference (P values ≤ 0.05 in all cases). (b) Direct comparisons between the eight pairs of interacting binding sites and predicted expression based on class I data alone (black lines). Five interactions (NR2F2-XBP1, NR2F2-ONECUT1, NR2F2-FOXA1, RXRA-XBP1 and HNF1A-XBP1) were identified by including combinatorial terms in the model (Wald χ2 test P values shown at top right), whereas three (FOXA1-RXRA, FOXA1-PPARA and TFAP2C-RXRA) were identified by directly comparing expression from binding site pairs in combinations with homotypic sequences containing an equal number of the binding sites in isolation (Wilcoxon rank-sum test). A single example is shown for each combination, corresponding to a fixed number of sites (the number of sites in each SRES is given in parentheses, and asterisks to the left of P values indicate those that are significant). In the box plots, the central rectangle spans the first and third quartiles, the line inside the rectangle is the median, and the lines beyond the box indicate the locations of the minimum and maximum values.

  5. Synthetic enhancers mimic mouse liver enhancers.
    Figure 5: Synthetic enhancers mimic mouse liver enhancers.

    (a) Categorization of 51,850 putative mouse liver enhancers into the 3 SRES classes on the basis of the distribution of the 12 transcription factor binding sites. (b) Frequencies of transcription factor binding site pairs for each of the 8 interactions identified in Figure 4 in putative mouse liver enhancers versus 103,700 matched random genomic controls.

  6. Effects of binding site order in heterotypic enhancers.
    Figure 6: Effects of binding site order in heterotypic enhancers.

    (a) Depiction of the 10 combinations (out of 39 total combinations) of 3 transcription factor binding sites with a favorable permutation resulting in significantly stronger expression (FDR < 0.05, Wilcoxon rank-sum test). The difference in expression between the best and worst permutation is depicted on the left. (b) Rank-value plot depicting 9 sets of SRESs containing ~49 permutations, each of the same 8 transcription factor binding sites. (c) HepG2 and mouse liver SRES expression strongly agree for the entire set of 441 permutations of 8 transcription factor binding sites. The red line is the regression trace. A specific example of the best and worst permutation for one of these sets (shown in purple in the plots in b,c) appears at the bottom.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Halfon, M.S. et al. Ras pathway specificity is determined by the integration of multiple signal-activated and tissue-restricted transcription factors. Cell 103, 6374 (2000).
  2. Lettice, L.A. et al. Opposing functions of the ETS factor family define Shh spatial expression in limb buds and underlie polydactyly. Dev. Cell 22, 459467 (2012).
  3. Spitz, F. & Furlong, E.E. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613626 (2012).
  4. Jeong, Y. et al. Regulation of a remote Shh forebrain enhancer by the Six3 homeoprotein. Nat. Genet. 40, 13481353 (2008).
  5. Benko, S. et al. Highly conserved non-coding elements on either side of SOX9 associated with Pierre Robin sequence. Nat. Genet. 41, 359364 (2009).
  6. Sturm, R.A. et al. A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue-brown eye color. Am. J. Hum. Genet. 82, 424431 (2008).
  7. Harismendy, O. et al. 9p21 DNA variants associated with coronary artery disease impair interferon-γ signalling response. Nature 470, 264268 (2011).
  8. Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 14971502 (2007).
  9. Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116120 (2012).
  10. Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854858 (2009).
  11. Blow, M.J. et al. ChIP-Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806810 (2010).
  12. Song, L. et al. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21, 17571767 (2011).
  13. Rastegar, S. et al. The words of the regulatory code are arranged in a variable manner in highly conserved enhancers. Dev. Biol. 318, 366377 (2008).
  14. Kulkarni, M.M. & Arnosti, D.N. Information display by transcriptional enhancers. Development 130, 65696575 (2003).
  15. Brown, C.D., Johnson, D.S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 15571560 (2007).
  16. Merika, M. & Thanos, D. Enhanceosomes. Curr. Opin. Genet. Dev. 11, 205208 (2001).
  17. Thanos, D. & Maniatis, T. Virus induction of human IFNβ gene expression requires the assembly of an enhanceosome. Cell 83, 10911100 (1995).
  18. Krivan, W. & Wasserman, W.W. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 11, 15591566 (2001).
  19. Lee, D., Karchin, R. & Beer, M.A. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 21, 21672180 (2011).
  20. Narlikar, L. et al. Genome-wide discovery of human heart enhancers. Genome Res. 20, 381392 (2010).
  21. Gotea, V. et al. Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res. 20, 565577 (2010).
  22. Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521530 (2012).
  23. Grskovic, M., Chaivorapol, C., Gaspar-Maia, A., Li, H. & Ramalho-Santos, M. Systematic identification of cis-regulatory sequences active in mouse and human embryonic stem cells. PLoS Genet. 3, e145 (2007).
  24. Gertz, J., Siggia, E.D. & Cohen, B.A. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457, 215218 (2009).
  25. Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271277 (2012).
  26. Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265270 (2012).
  27. Kim, M.J. et al. Functional characterization of liver enhancers that regulate drug-associated transporters. Clin. Pharmacol. Ther. 89, 571578 (2011).
  28. Zhang, G., Budker, V. & Wolff, J.A. High levels of foreign gene expression in hepatocytes after tail vein injections of naked plasmid DNA. Hum. Gene Ther. 10, 17351737 (1999).
  29. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L.A. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88D92 (2007).
  30. Donoghue, M., Ernst, H., Wentworth, B., Nadal-Ginard, B. & Rosenthal, N. A muscle-specific enhancer is located at the 3′ end of the myosin light-chain 1/3 gene locus. Genes Dev. 2, 17791790 (1988).
  31. Issemann, I., Prince, R.A., Tugwood, J.D. & Green, S. The peroxisome proliferator–activated receptor:retinoid X receptor heterodimer is activated by fatty acids and fibrate hypolipidaemic drugs. J. Mol. Endocrinol. 11, 3747 (1993).
  32. Williams, T. & Tjian, R. Characterization of a dimerization motif in AP-2 and its function in heterologous DNA-binding proteins. Science 251, 10671071 (1991).
  33. De Val, S. et al. Combinatorial regulation of endothelial gene expression by ets and forkhead transcription factors. Cell 135, 10531064 (2008).
  34. Sakamoto, Y., Ishiguro, M. & Kitagawa, G. Akaike Information Criterion Statistics (KTK Scientific Publishers, Tokyo, 1986).
  35. Tomovic, A. & Oakeley, E.J. Position dependencies in transcription factor binding sites. Bioinformatics 23, 933941 (2007).
  36. Lupien, M. et al. FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell 132, 958970 (2008).
  37. Sladek, F.M., Zhong, W.M., Lai, E. & Darnell, J.E. Jr. Liver-enriched transcription factor HNF-4 is a novel member of the steroid hormone receptor superfamily. Genes Dev. 4, 23532365 (1990).
  38. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 17201723 (2009).
  39. Watt, A.J., Zhao, R., Li, J. & Duncan, S.A. Development of the mammalian liver and ventral pancreas is dependent on GATA4. BMC Dev. Biol. 7, 37 (2007).
  40. Dame, C. et al. Hepatic erythropoietin gene regulation by GATA-4. J. Biol. Chem. 279, 29552961 (2004).
  41. Schwartz, J.J., Lee, C. & Shendure, J. Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules. Nat. Methods 9, 913915 (2012).
  42. Zhang, Y., Werling, U. & Edelmann, W. SLiCE: a novel bacterial cell extract–based DNA cloning method. Nucleic Acids Res. 40, e55 (2012).
  43. Ovcharenko, I. et al. Mulan: multiple-sequence local alignment and visualization for studying function and evolution. Genome Res. 15, 184194 (2005).
  44. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  45. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., B 57, 289300 (1995).
  46. Cook, D. Detection of influential observation in linear regression. Technometrics 19, 1518 (1977).
  47. Bailey, T.L. & Gribskov, M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 4854 (1998).

Download references

Author information

  1. These authors contributed equally to this work.

    • Robin P Smith &
    • Leila Taher

Affiliations

  1. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA.

    • Robin P Smith,
    • Mee J Kim,
    • Fumitaka Inoue &
    • Nadav Ahituv
  2. Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA.

    • Robin P Smith,
    • Mee J Kim,
    • Fumitaka Inoue &
    • Nadav Ahituv
  3. Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, US National Institutes of Health, Bethesda, Maryland, USA.

    • Leila Taher &
    • Ivan Ovcharenko
  4. Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Rostock, Germany.

    • Leila Taher
  5. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Rupali P Patwardhan &
    • Jay Shendure

Contributions

R.P.S., L.T., R.P.P., J.S., I.O. and N.A. conceived key aspects of the project and planned experiments. R.P.S., R.P.P., M.J.K. and F.I. performed experiments. L.T., R.P.S. and R.P.P. analyzed data. R.P.S., L.T., R.P.P., I.O., J.S. and N.A. wrote the manuscript. All authors commented on and revised the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,694.781 KB)

    Supplementary Figures 1–10, Supplementary Tables 1–3 and 6, and Supplementary Note

Excel files

  1. Supplementary Table 4 (518,516 KB)

    Complete listing of 4,966 SRESs (no controls) tested in the study

  2. Supplementary Table 5 (98,422 KB)

    Summary of expression for the best and worst permutations from 211 sets of SRESs containing the same complement of transcription factor binding sites in different orders

Additional data