Promoter shape varies across populations and affects promoter evolution and expression noise

Article metrics


Animal promoters initiate transcription either at precise positions (narrow promoters) or dispersed regions (broad promoters), a distinction referred to as promoter shape. Although highly conserved, the functional properties of promoters with different shapes and the genetic basis of their evolution remain unclear. Here we used natural genetic variation across a panel of 81 Drosophila lines to measure changes in transcriptional start site (TSS) usage, identifying thousands of genetic variants affecting transcript levels (strength) or the distribution of TSSs within a promoter (shape). Our results identify promoter shape as a molecular trait that can evolve independently of promoter strength. Broad promoters typically harbor shape-associated variants, with signatures of adaptive selection. Single-cell measurements demonstrate that variants modulating promoter shape often increase expression noise, whereas heteroallelic interactions with other promoter variants alleviate these effects. These results uncover new functional properties of natural promoters and suggest the minimization of expression noise as an important factor in promoter evolution.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Identification of tssQTL using 5′ CAGE and PC analysis.
Figure 2: Classification of tssQTLs.
Figure 3: Genomic properties of variants affecting promoter shape or transcript abundance.
Figure 4: Core promoter motifs affected in tssQTLs.
Figure 5: Promoter shape influences promoter evolution.
Figure 6: Changes in promoter shape are associated with increased expression noise.

Accession codes

Primary accessions



  1. 1

    Kadonaga, J.T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev. Biol. 1, 40–51 (2012).

  2. 2

    Sandelin, A. et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat. Rev. Genet. 8, 424–436 (2007).

  3. 3

    Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet. 13, 233–245 (2012).

  4. 4

    Li, X. & Noll, M. Compatibility between enhancers and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila embryo. EMBO J. 13, 400–406 (1994).

  5. 5

    Hansen, S.K. & Tjian, R. TAFs and TFIIA mediate differential utilization of the tandem Adh promoters. Cell 82, 565–575 (1995).

  6. 6

    Butler, J.E. & Kadonaga, J.T. Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev. 15, 2515–2519 (2001).

  7. 7

    Zehavi, Y., Kuznetsov, O., Ovadia-Shochat, A. & Juven-Gershon, T. Core promoter functions in the regulation of gene expression of Drosophila dorsal target genes. J. Biol. Chem. 289, 11993–12004 (2014).

  8. 8

    Merli, C., Bergstrom, D.E., Cygan, J.A. & Blackman, R.K. Promoter specificity mediates the independent regulation of neighboring genes. Genes Dev. 10, 1260–1270 (1996).

  9. 9

    Juven-Gershon, T., Hsu, J.Y. & Kadonaga, J.T. Caudal, a key developmental regulator, is a DPE-specific transcriptional factor. Genes Dev. 22, 2823–2830 (2008).

  10. 10

    Zabidi, M.A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).

  11. 11

    Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).

  12. 12

    Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

  13. 13

    Forrest, A.R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

  14. 14

    Hoskins, R.A. et al. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 21, 182–192 (2011).

  15. 15

    Akalin, A. et al. Transcriptional features of genomic regulatory blocks. Genome Biol. 10, R38 (2009).

  16. 16

    Suzuki, Y. et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).

  17. 17

    Cooper, S.J., Trinklein, N.D., Anton, E.D., Nguyen, L. & Myers, R.M. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 16, 1–10 (2006).

  18. 18

    FitzGerald, P.C., Sturgill, D., Shyakhtenko, A., Oliver, B. & Vinson, C. Comparative genomics of Drosophila and human core promoters. Genome Biol. 7, R53 (2006).

  19. 19

    Ohler, U. Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res. 34, 5943–5950 (2006).

  20. 20

    Ni, T. et al. A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat. Methods 7, 521–527 (2010).

  21. 21

    Rach, E.A., Yuan, H.-Y.Y., Majoros, W.H., Tomancak, P. & Ohler, U. Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome. Genome Biol. 10, R73 (2009).

  22. 22

    Nozaki, T. et al. Tight associations between transcription promoter type and epigenetic variation in histone positioning and modification. BMC Genomics 12, 416 (2011).

  23. 23

    Rach, E.A. et al. Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level. PLoS Genet. 7, e1001274 (2011).

  24. 24

    Haberle, V. et al. Two independent transcription initiation codes overlap on vertebrate core promoters. Nature 507, 381–385 (2014).

  25. 25

    Main, B.J., Smith, A.D., Jang, H. & Nuzhdin, S.V. Transcription start site evolution in Drosophila. Mol. Biol. Evol. 30, 1966–1974 (2013).

  26. 26

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  27. 27

    Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

  28. 28

    Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

  29. 29

    Francesconi, M. & Lehner, B. The effects of genetic variation on gene expression dynamics during development. Nature 505, 208–211 (2014).

  30. 30

    Huang, W. et al. Genetic basis of transcriptome diversity in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 112, E6010–E6019 (2015).

  31. 31

    Massouras, A. et al. Genomic variation and its impact on gene expression in Drosophila melanogaster. PLoS Genet. 8, e1003055 (2012).

  32. 32

    Cannavo, E. et al. Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 541, 402–406 (2017).

  33. 33

    Takahashi, H., Lassmann, T., Murata, M. & Carninci, P. 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat. Protoc. 7, 542–561 (2012).

  34. 34

    Mackay, T.F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173–178 (2012).

  35. 35

    Engström, P.G., Ho Sui, S.J., Drivenes, O., Becker, T.S. & Lenhard, B. Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res. 17, 1898–1908 (2007).

  36. 36

    van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J.K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

  37. 37

    Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).

  38. 38

    Casale, F.P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015).

  39. 39

    Lippert, C., Casale, F.P., Rakitsch, B. & Stegle, O. LIMIX: genetic analysis of multiple traits. Preprint at (2014).

  40. 40

    Huang, W. et al. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res. 24, 1193–1208 (2014).

  41. 41

    Shim, H. & Stephens, M. Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays. Ann. Appl. Stat. 9, 665–686 (2015).

  42. 42

    Ohler, U., Liao, G.C., Niemann, H. & Rubin, G.M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3, RESEARCH0087 (2002).

  43. 43

    Dreos, R., Ambrosini, G. & Bucher, P. Influence of Rotational Nucleosome Positioning on Transcription Start Site Selection in Animal Promoters. PLoS Comput. Biol. 12, e1005144 (2016).

  44. 44

    Xi, L. et al. Predicting nucleosome positioning using a duration Hidden Markov Model. BMC Bioinformatics 11, 346 (2010).

  45. 45

    Brown, C.D., Mangravite, L.M. & Engelhardt, B.E. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet. 9, e1003649 (2013).

  46. 46

    Arbiza, L. et al. Genome-wide inference of natural selection on human transcription factor binding sites. Nat. Genet. 45, 723–729 (2013).

  47. 47

    Gronau, I., Arbiza, L., Mohammed, J. & Siepel, A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol. Biol. Evol. 30, 1159–1171 (2013).

  48. 48

    Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

  49. 49

    Spitz, F. & Furlong, E.E. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).

  50. 50

    Metzger, B.P.H., Yuan, D.C., Gruber, J.D., Duveau, F. & Wittkopp, P.J. Selection on noise constrains variation in a eukaryotic promoter. Nature 521, 344–347 (2015).

  51. 51

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

  52. 52

    Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

  53. 53

    Bailey, T.L., Johnson, J., Grant, C.E. & Noble, W.S. The MEME Suite. Nucleic Acids Res. 43, W39–W49 (2015).

  54. 54

    Pietrokovski, S. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res. 24, 3836–3845 (1996).

  55. 55

    Down, T.A., Bergman, C.M., Su, J. & Hubbard, T.J. Large-scale discovery of promoter motifs in Drosophila melanogaster. PLoS Comput. Biol. 3, e7 (2007).

  56. 56

    Gaffney, D.J. et al. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 13, R7 (2012).

  57. 57

    Thomas, S. et al. Dynamic reprogramming of chromatin accessibility during Drosophila embryo development. Genome Biol. 12, R43 (2011).

  58. 58

    Phipson, B., Maksimovic, J. & Oshlack, A. missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform. Bioinformatics 32, 286–288 (2016).

Download references


We thank all members of E.E.M.F.'s laboratory for discussions and comments. We are very grateful to P. Carninci and A.M. Suzuki for support regarding the CAGE protocol. This work was technically supported by the EMBL Genomics and Flow Cytometry Core Facilities. This work was financially supported by the European Research Council (FP/2007-2013)/ERC grant agreement 322851 (ERC advanced grant CisRegVar) to E.E.M.F. and post-doctoral fellowships from the Human Frontiers in Science Program (HFSP) and EMBO to J.F.D.

Author information

E.E.M.F., I.E.S., and O.S. designed the study, explored the results and prepared and edited the manuscript, together with contributions from all authors, including E.B. I.E.S. performed all CAGE and FACS experiments and led data analysis associated with QTL effects and frequency. J.F.D. developed the QTL calling and classification procedure with contributions from I.E.S., F.P.C., H.S., M.S., and O.S. H.S. and M.S. developed the wavelet analysis. D.H. performed all data analysis downstream of QTL calling. E.C. provided staged embryos and RNA. D.A.G. performed natural selection analysis.

Correspondence to Oliver Stegle or Eileen E M Furlong.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Complex molecular phenotypes captured by the PCA-based QTL-calling approach.

Top, three synthetic examples of QTL illustrating three aspects of CAGE signal - shape, level, and position of the highest peak within the cluster. Middle, loadings for the first principle component when genotypes are simulated from these hypothetical QTL. Bottom, separation of hypothetical genotype classes based on either projections onto the top principle component (y-axis) or average CAGE signal across entire window (x-axis). While genotype classes in the middle column can be distinguished by both the mean signal and the projections on the first PC, genotypes in the left and right columns are separated by the first PC but not by the mean signal.

Supplementary Figure 2 tssQTL generally have effects that are independent of developmental stage.

Shown are two representative tssQTL with effects at all three embryonic stages. a, Heatmaps of CAGE signal within a promoter window, with individual genotypes (rows) grouped into major (red, above) and minor (blue, below), for the same examples shown in Fig. 1D. Lead variants are chrX_13562404_SNP (CG11164) and chr3L_8103470_SNP (CG7927). b, Manhattan plots, showing the log10 (p-values) for all tested variants in each case. Significant associations tend to be common to all there developmental time-points and not stage-specific (results of time-specific effect tests are on Supplementary Tables 2 and 3).

Supplementary Figure 3 tssQTL in genes with multiple promoters.

Left, histograms showing the number of QTL-associated promoters per gene, for 1,610 genes with more than one associated CAGE window (varying between 2 and 8 per gene). Right, expected number of QTL-associated promoters per gene, assuming that the probability of tssQTL across promoters is uniform (no dependence within a gene). The observed and predicted distributions are not significantly different (p = 0.9986, two-sided Kolmogorov-Smirnov test).

Supplementary Figure 4 Wavelet analysis.

a, Representation of different wavelet coefficients in an example 30 bp window. b, Detailed relationship between CAGE signal (white in heatmap above) and single-base effect sizes (below) in the genomic space (left) and the evidence of effect on the different wavelet coefficients (right), for the CG7927 example (included in Fig. 2a). Significant effects are shown in red.

Supplementary Figure 5 Validation of lead tssQTL variants.

The distribution of single-cell expression values for tssQTL, showing promoter strength differences between Maj, Min and Majmin promoter variants, extending those shown in Fig. 4b. Promoter single cell assay quantify promoter activation of sfGFP in individual measured by analytic FACS. We show the distribution of expression values (sfGFP/mCherry ratios) for both natural promoter variants (Maj and Min) and engineered promoters, placing the lead SNP into the Maj haplotype (Majmin). Dashed black line: population median. Grey error bars: average expression value +/- SEM, calculated from 2 (wls, CG15739 and Hn (Min, Majmin1 and Majmin12)) or 3 (CG17802, l(3)01239, CG11210 and Hn (Maj and Majmin2)) transfection replicates. * p-values for two-sided Welch t-test comparing population averages. # p-values from one-sided Welch t-test. Lead variants are: chr3L_ 11162703_SNP (wls), chr3R_13630942_SNP (CG17802), chr3L_11067477_SNP (l(3)01239), chr2R_3986212_SNP (CG11210), chrX_11767477_SNP (CG15739), chr3L_ 7753664_SNP (Hn variant 1) and chr3L_ 7753657_SNP (Hn variant 2).

Supplementary Figure 6 Effects of changes in positioned downstream motifs on promoter shape and strength.

Change in raw CAGE signal and shape index between the minor and the major genotypes for tssQTL affecting DPE- (downstream promoter element) and MTE- (motif ten element) like motifs located 10-40bp downstream of the highest effected TSS. Created or destroyed indicates a motif score that is higher or lower in the minor allele variant, respectively. Symbols indicate aggregate promoter shape (see Methods).

Supplementary Figure 7 Motif-based classification of promoters and their relationship to tssQTL.

We used Ohler et al proposed classification of Drosophila promoters based on five motif-based classes1,2: Inr+TATA, Inr+DPE, Inr-only, characteristic of narrow promoters and DRE and Motif1/6 motifs, characteristic of broad promoters. For each promoter class (defined by this motif classification1), the following is shown: left, positioning of the indicated motifs from the main TSS; middle, fraction of promoters with a tssQTL; right, the relative proportion of tssQTL classes. The classes corresponding to broad promoters (indicated at the bottom) show elevated tssQTL number (2.84-fold increase in the broad classes compared to narrow, p = 2.2x10-16, Fisher’s exact test) and a higher proportion of shape QTL, recapitulating the differences observed between broad and narrow promoters classified based on their shape index (Fig. 5a).1. Ohler, U. Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic acids research 34, 5943-5950, doi:10.1093/nar/gkl608 (2006).2. Ohler, U., Liao, G.-c. C., Niemann, H. & Rubin, G. M. Computational analysis of core promoters in the Drosophila genome. Genome biology 3, Research0087 (2002).

Supplementary Figure 8 Estimated detection power of abundance and shape QTL in different promoter shape classes.

Simulation study to assess statistical power for detecting abundance or shape QTL in broad and narrow promoters and for different effect sizes. Given the small number of TSS within a narrow promoter, it is expected that detection of shape QTL within them may be inherently more difficult, potentially leading to their underestimation. Signal distribution, minor allele frequency and library sizes were simulated to match those observed in real data. Top panel: simulated shape QTL, where the increasing proportions (indicated on top) of the main peak were redistributed in the second genotype group. Bottom panel: Simulated abundance QTL, where increasing proportions of bases within promoter windows (indicated on top) are affected in the same direction with increasing fold-change effect (x axis). After generating the simulated counts (using a Poisson distribution), the simulated CAGE profiles were processed analogous to the approach taken for the real data, including normalization and principal component analysis to obtain the projections onto the first three PCs. Significance was assessed using a t-test for each PC to test differences between genotype classes. The proportion (error bars indicate 95% CI) of simulated windows for which significant differences could be detected (power at p < 0.05) is indicated in the y-axis.

Supplementary Figure 9 Number of bases under negative selection at different promoter shapes.

INSIGHT analysis on bins of promoter windows according to their shape indexes. Brown shading indicates the 30% broadest or narrowest promoters. rho = fraction of sites under negative selection, E(w) = the number of sites under weak negative selection. All windows have the same size (1,024 bp). Error bars represent uncertainty estimates for INSIGHT parameter estimates (standard errors), which are derived from the curvature of INSIGHT's likelihood function1.1. Gronau, I., Arbiza, L., Mohammed, J. & Siepel, A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Molecular biology and evolution 30, 1159-1171, doi:10.1093/molbev/mst019 (2013).

Supplementary Figure 10 Patterns of genetic variation at different promoter shapes.

a, Broad promoters show an increase in the fraction of non-ancestral sites (y-axis) for promoters with different shape index (x-axis). Brown shading indicates the top 30% for each shape type. Both the number of polymorphic sites where none of the alleles is the inferred ancestral allele (top), and the number of fixed (monomorphic) sites diverging from the ancestral state (middle) show a clear dependence on promoter shape, with broadest promoters having elevated rates for the non-ancestral states. In contrast, polymorphism rates are equal or reduced in broader shape bins compared to narrow (bottom), arguing against elevated substitution rates being strictly a function of differences in the fraction of functional bases. b, Average phastCons scores for windows of the indicated shape indexes show evidence for increased conservation in narrow promoter categories. Scores where obtained from the UCSC Genome Browser. In all cases, error bars correspond to 95% CI.

Supplementary Figure 11 Examples of changes in expression level and noise due to genetic variants affecting promoter shape.

Promoter single cell assay using analytic FACS (as in Fig. 4a). Population median indicates expression level, while median absolute deviation from the median indicates cell-to-cell variation (expression noise). Four promoters harboring shape or mixed QTL with a predominant shape effect (extension of Fig. 6b,c). Lead variants are chr2L_10431345_SNP (TfIIB), chr3L_ 1307602_INS (CG2469), chrX_ 6161845_SNP (Spt6) and chr3L_11162703_SNP (wls). Expression level is the mean of the population medians, and noise is the mean of population MADs (3 transfection replicates, >15,000 cells per construct), with bars corresponding to SEM. Green and purple arrows indicate the effect of the lead variant mutation alone, while orange arrows indicate the effect of changing all remaining variants in the transition to the opposite natural haplotype. Significance was assessed by a Levene test for homogeneity of variances, only significant p-values (< 0.05) are shown.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11, Supplementary Table 5, and Supplementary Note (PDF 3127 kb)

Supplementary Table 1

CAGE windows. (TXT 2213 kb)

Supplementary Table 2

All tssQTLs before filtering. (TXT 1325 kb)

Supplementary Table 3

High-confidence tssQTLs after filtering. (TXT 1348 kb)

Supplementary Table 4

Promoter motifs. (TXT 45 kb)

Supplementary Table 6

CAGE library statistics. (TXT 18 kb)

Supplementary Table 7

CAGE clusters. (TXT 3593 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schor, I., Degner, J., Harnett, D. et al. Promoter shape varies across populations and affects promoter evolution and expression noise. Nat Genet 49, 550–558 (2017) doi:10.1038/ng.3791

Download citation

Further reading