Targeted engineering of plant gene expression holds great promise for ensuring food security and for producing biopharmaceuticals in plants. However, this engineering requires thorough knowledge of cis-regulatory elements to precisely control either endogenous or introduced genes. To generate this knowledge, we used a massively parallel reporter assay to measure the activity of nearly complete sets of promoters from Arabidopsis, maize and sorghum. We demonstrate that core promoter elements—notably the TATA box—as well as promoter GC content and promoter-proximal transcription factor binding sites influence promoter strength. By performing the experiments in two assay systems, leaves of the dicot tobacco and protoplasts of the monocot maize, we detect species-specific differences in the contributions of GC content and transcription factors to promoter strength. Using these observations, we built computational models to predict promoter strength in both assay systems, allowing us to design highly active promoters comparable in activity to the viral 35S minimal promoter. Our results establish a promising experimental approach to optimize native promoter elements and generate synthetic ones with desirable features.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All sequencing results are deposited in the NCBI Sequence Read Archive under the BioProject accession PRJNA714258.
The code used in this study is available on Github (https://github.com/tobjores/Synthetic-Promoter-Designs-Enabled-by-a-Comprehensive-Analysis-of-Plant-Core-Promoters).
Liu, W. & Stewart, C. N. Plant synthetic biology. Trends Plant Sci. 20, 309–317 (2015).
Lomonossoff, G. P. & D’Aoust, M.-A. Plant-produced biopharmaceuticals: a case of technical developments driving clinical deployment. Science 353, 1237–1240 (2016).
Smale, S. T. & Kadonaga, J. T. The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003).
Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).
Ricci, W. A. et al. Widespread long-range cis-regulatory elements in the maize genome. Nat. Plants 5, 1237–1249 (2019).
Banerji, J., Rusconi, S. & Schaffner, W. Expression of a β-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308 (1981).
Banerji, J., Olson, L. & Schaffner, W. A lymphocyte-specific cellular enhancer is located downstream of the joining region in immunoglobulin heavy chain genes. Cell 33, 729–740 (1983).
Grosschedl, R. & Birnstiel, M. L. Identification of regulatory sequences in the prelude sequences of an H2A histone gene by the study of specific deletion mutants in vivo. Proc. Natl Acad. Sci. USA 77, 1432–1436 (1980).
Wasylyk, B. et al. Specific in vitro transcription of conalbumin gene is drastically decreased by single-point mutation in T-A-T-A box homology sequence. Proc. Natl Acad. Sci. USA 77, 7024–7028 (1980).
Smale, S. T. & Baltimore, D. The “initiator” as a transcription control element. Cell 57, 103–113 (1989).
Ince, T. A. & Scotto, K. W. A conserved downstream element defines a new class of RNA polymerase II promoters. J. Biol. Chem. 270, 30249–30252 (1995).
Burke, T. W. & Kadonaga, J. T. Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA box-deficient promoters. Genes Dev. 10, 711–724 (1996).
Lagrange, T., Kapanidis, A. N., Tang, H., Reinberg, D. & Ebright, R. H. New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 12, 34–44 (1998).
Lewis, B. A., Kim, T.-K. & Orkin, S. H. A downstream element in the human β-globin promoter: evidence of extended sequence-specific transcription factor IID contacts. Proc. Natl Acad. Sci. USA 97, 7172–7177 (2000).
Lim, C. Y. et al. The MTE, a new core promoter element for transcription by RNA polymerase II. Genes Dev. 18, 1606–1617 (2004).
Deng, W. & Roberts, S. G. E. A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev. 19, 2418–2423 (2005).
Parry, T. J. et al. The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev. 24, 2013–2018 (2010).
Molina, C. & Grotewold, E. Genome wide analysis of Arabidopsis core promoters. BMC Genom. 6, 25 (2005).
Yamamoto, Y. Y. et al. Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis. Nucleic Acids Res. 35, 6219–6226 (2007).
Bernard, V., Brunaud, V. & Lecharny, A. TC-motifs at the TATA box expected position in plant genes: a novel class of motifs involved in the transcription regulation. BMC Genom. 11, 166 (2010).
Blake, M. C., Jambou, R. C., Swick, A. G., Kahn, J. W. & Azizkhan, J. C. Transcriptional initiation is controlled by upstream GC-box interactions in a TATAA-less promoter. Mol. Cell. Biol. 10, 6632–6641 (1990).
Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
Lubliner, S. et al. Core promoter sequence in yeast is a major determinant of expression level. Genome Res. 25, 1008–1017 (2015).
Arnold, C. D. et al. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat. Biotechnol. 35, 136–144 (2017).
van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
Weingarten-Gabbay, S. et al. Systematic interrogation of human promoters. Genome Res. 29, 171–183 (2019).
de Boer, C. G. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38, 56–65 (2020).
Kotopka, B. J. & Smolke, C. D. Model-driven generation of artificial yeast promoters. Nat. Commun. 11, 2113 (2020).
Kumari, S. & Ware, D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS ONE 8, e79011 (2013).
Morton, T. et al. Paired-end analysis of transcription start sites in Arabidopsis reveals plant-specific promoter signatures. Plant Cell 26, 2746–2760 (2014).
Zhu, Q., Dabi, T. & Lamb, C. TATA box and initiator functions in the accurate transcription of a plant minimal promoter in vitro. Plant Cell 7, 1681–1689 (1995).
Kiran, K. et al. The TATA box sequence in the basal promoter contributes to determining light-dependent gene expression in plants. Plant Physiol. 142, 364–376 (2006).
Srivastava, R. et al. Distinct role of core promoter architecture in regulation of light-mediated responses in plant genes. Mol. Plant 7, 626–641 (2014).
Jores, T. et al. Identification of plant enhancers and their constituent elements by STARR-seq in tobacco leaves. Plant Cell 32, 2120–2131 (2020).
Cai, Y.-M. et al. Rational design of minimal synthetic promoters for plants. Nucleic Acids Res. 48, 11845–11856 (2020).
Srivastava, A. K., Lu, Y., Zinta, G., Lang, Z. & Zhu, J.-K. UTR-dependent control of gene expression in plants. Trends Plant Sci. 23, 248–259 (2018).
Fang, R. X., Nagy, F., Sivasubramaniam, S. & Chua, N. H. Multiple cis regulatory elements for maximal expression of the cauliflower mosaic virus 35S promoter in transgenic plants. Plant Cell 1, 141–150 (1989).
Benfey, P. N., Ren, L. & Chua, N. H. Tissue-specific expression from CaMV 35S enhancer subdomains in early stages of plant development. EMBO J. 9, 1677–1684 (1990).
Bruce, W. B., Christensen, A. H., Klein, T., Fromm, M. & Quail, P. H. Photoregulation of a phytochrome gene promoter from oat transferred into rice by particle bombardment. Proc. Natl Acad. Sci. USA 86, 9692–9696 (1989).
Yahraus, T., Chandra, S., Legendre, L. & Low, P. S. Evidence for a mechanically induced oxidative burst. Plant Physiol. 109, 1259–1266 (1995).
Walley, J. W. et al. Integration of omic networks in a developmental atlas of maize. Science 353, 814–818 (2016).
Wang, B. et al. A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing. Genome Res. 28, 921–932 (2018).
Mergner, J. et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature 579, 409–414 (2020).
Singh, R., Ming, R. & Yu, Q. Comparative analysis of GC content variations in plant genomes. Trop. Plant Biol. 9, 136–149 (2016).
Rensink, W. A. et al. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts. BMC Genom. 6, 124 (2005).
Tsai, F. T. F. & Sigler, P. B. Structural basis of preinitiation complex assembly on human Pol II promoters. EMBO J. 19, 25–36 (2000).
Gehrig, J. et al. Automated high-throughput mapping of promoter–enhancer interactions in zebrafish embryos. Nat. Methods 6, 911–916 (2009).
Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).
Heerah, S., Katari, M., Penjor, R., Coruzzi, G. & Marshall-Colon, A. WRKY1 mediates transcriptional regulation of light and nitrogen signaling pathways. Plant Physiol. 181, 1371–1388 (2019).
Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps Identified by STARR-seq. Science 339, 1074–1077 (2013).
Klein, J. C. et al. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat. Methods 17, 1083–1091 (2020).
Hong, C. K. & Cohen, B. A. Genomic environments scale the activities of diverse core promoters. Preprint at bioRxiv https://doi.org/10.1101/2021.03.08.434469 (2021).
Dorrity, M. W. et al. The regulatory landscape of Arabidopsis thaliana roots at single-cell resolution. Nat. Commun. https://doi.org/10.1038/s41467-021-23675-y (2021).
Marand, A. P., Chen, Z., Gallavotti, A. & Schmitz, R. J. A cis-regulatory atlas in maize at single-cell resolution. Cell https://doi.org/10.1016/j.cell.2021.04.014 (2021).
Zhang, T.-Q., Chen, Y., Liu, Y., Lin, W.-H. & Wang, J.-W. Single-cell transcriptome atlas and chromatin accessibility landscape reveal differentiation trajectories in the rice root. Nat. Commun. 12, 2053 (2021).
Cheng, C.-Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789–804 (2017).
McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).
Mejía-Guerra, M. K. et al. Core promoter plasticity between maize tissues and genotypes contrasts with predominance of sharp transcription initiation sites. Plant Cell 27, 3309–3320 (2015).
Jiao, Y. et al. Improved maize reference genome with single-molecule technologies. Nature 546, 524–527 (2017).
Engler, C., Kandzia, R. & Marillonnet, S. A one pot, one step, precision cloning method with high throughput capability. PLoS ONE 3, e3647 (2008).
Hellens, R. P., Edwards, E. A., Leyland, N. R., Bean, S. & Mullineaux, P. M. pGreen: a versatile and flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Mol. Biol. 42, 819–832 (2000).
Sheen, J. Metabolic repression of transcription in higher plants. Plant Cell 2, 1027–1038 (1990).
Masella, A. P., Bartram, A. K., Truszkowski, J. M., Brown, D. G. & Neufeld, J. D. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinform. 13, 31 (2012).
Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
Madeira, F. et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 47, W636–W641 (2019).
Shahmuradov, I. A., Gammerman, A. J., Hancock, J. M., Bramley, P. M. & Solovyev, V. V. PlantProm: a database of plant promoter sequences. Nucleic Acids Res. 31, 114–117 (2003).
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2020).
Tian, F., Yang, D.-C., Meng, Y.-Q., Jin, J. & Gao, G. PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res. 48, D1104–D1113 (2020).
Onimaru, K., Nishimura, O. & Kuraku, S. Predicting gene regulatory regions with a convolutional neural network for processing double-strand genome sequence information. PLoS ONE 15, e0235748 (2020).
We thank A. Gutierrez Diaz and E. Grotewold for providing maize TSS data, and A. Gallavotti for providing maize B73 seeds. This work was supported by the National Science Foundation (RESEARCH-PGR grant no. 1748843 to E.S.B., S.F. and C.Q.), the German Research Foundation (DFG; fellowship no. 441540116 to T.J.) and the National Institutes of Health (T32 training grant no. HG000035 to J.T. and R01-GM079712 to C.Q. and J.T.C.).
The authors declare no competing interests.
Peer review information Nature Plants thanks Philip Benfey, Shira Weingarten-Gabbay and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Promoter strength and in vivo expression levels of corresponding genes are not correlated.
a, Correlation (Pearson’s r) between the promoter strength and expression levels of the corresponding genes in the indicated species. Each boxplot (centre line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range; points, outliers) represents the correlation for all individual tissue samples in the RNA-seq dataset (see Methods). The number of samples in the RNA-seq dataset is indicated at the bottom of the plot. b,c, Examples of the correlation between gene expression (Arabidopsis adult cotyledon (b) or maize root cortex (c) samples) and promoter strength as determined in tobacco leaves (b) or maize protoplasts (c). These examples correspond to the highest correlations in (a).
Extended Data Fig. 2 Strength of maize promoters depends on the TATA box location in maize protoplasts.
a, Histogram showing the percentage of maize promoters with a TATA box at the indicated position (reproduced from Fig. 4). Three peaks in the distribution of TATA boxes are highlighted in grey. Peak 1 spans bases −72 to −65, peak 2 spans bases −59 to −50, and peak 3 spans bases −34 to −24. b, Violin plots, boxplots and significance levels (as defined in Fig. 2) of promoter strength for maize promoters without enhancer in the indicated assay system. Promoters without a TATA box (−) were compared to those with a TATA box outside (+/−) or within one of the three peaks highlighted in (a).
a-d, Violin plots of promoter strength in tobacco leaves (a,c) or maize protoplasts (b,d). Promoters with a strong or intermediate TATA box (motif score ≥ 0.7; see Methods) were grouped by GC content and split into promoters without (left half, darker colour) or with (right half, lighter colour) a BREu (a,b), or BREd (c,d) element. Violin plots, boxplots and significance levels are as defined in Fig. 2. Only one half is shown for violin plots. e,f, Logoplots for promoters with a BREu (e) or BREd (f) before (WT) and after (mut) introducing mutations that disrupt the elements. g, Logoplots for promoters without a BRE (WT) and with an inserted BREu (+ BREu) or BREd (+ BREd) element. h, Boxplots and significance levels (as defined in Fig. 4) for the relative strength of the promoter variants shown in (e-g). The corresponding WT promoter was set to 0 (horizontal black line).
a, Histogram showing the percentage of promoters with a TATA box at the indicated position. b,c, Violin plots of promoter strength in tobacco leaves (b) or maize protoplasts (c). Promoters were grouped by GC content and split into promoters without (left half, darker colour) or with (right half, lighter colour) a Y patch. Violin plots, boxplots and significance levels are as defined in Fig. 2. Only one half is shown for violin plots.
a-d, Violin plots of promoter strength in tobacco leaves (a,c) or maize protoplasts (b,d). Promoters were grouped by GC content and split into promoters without (left half, darker colour) or with (right half, lighter colour) an Inr (a,b), or TCT (c,d) element at the TSS. Violin plots, boxplots and significance levels are as defined in Fig. 2. Only one half is shown for violin plots.
Extended Data Fig. 6 Transcription factor binding sites contribute to promoter strength in an assay system-dependent manner.
a-d, Violin plots of promoter strength for libraries without enhancer in tobacco leaves (a,c) or maize protoplasts (b,d). Promoters were grouped by GC content and split into promoters without (left half, darker colour) or with (right half, lighter colour) a binding site for TCP (a,b) or HSF (c,d) transcription factors. Violin plots, boxplots and significance levels are as defined in Fig. 2. Only one half is shown for violin plots.
a-c, Histograms showing the number of promoters with a TCP (a), HSF (b), or NAC (c) transcription factor binding site at the indicated position. d-i, Violin plots, boxplots and significance levels (as defined in Fig. 2) of promoter strength for libraries without enhancer in tobacco leaves (d-f) or maize protoplasts (g-i). Promoters were grouped by the position of their TCP (d,g), HSF (e,h), or NAC (f,i) transcription factor binding site relative to the TATA box: either upstream (up) or downstream (down).
Extended Data Fig. 8 Promoter-proximal transcription factor binding sites influence enhancer responsiveness.
a-f, Violin plots of enhancer responsiveness in tobacco leaves (a,c,e) or maize protoplasts (b,d,f). Promoters were grouped by GC content and split into promoters without (left half, darker colour) or with (right half, lighter colour) a TCP (a,b), WRKY (c,d), or B3 (e,f) transcription factor binding site. Violin plots, boxplots and significance levels are as defined in Fig. 2. Only one half is shown for violin plots.
a-c, One or two T > G mutations were introduced in binding sites for TCP (a,b) or WRKY (c) transcription factors. The orientation of a binding site in the wild type promoter determined the bases that were mutated. d, Boxplots and significance levels (as defined in Fig. 4) for the relative light-dependency of promoters harbouring mutations in the indicated transcription factor binding site as shown in (a-c). The corresponding wild type promoter was set to 0 (horizontal black line).
a,b, 150 native and 160 synthetic promoters were subjected to 10 rounds of in silico evolution and the strength of the evolved promoters was predicted with the tobacco model (a) or the maize model (b). The black line represents the median promoter strength after each round. c,d, Correlation (Pearson’s R2 and Spearman’s ρ) between the predicted and experimentally determined strength of promoters after 0, 3, or 10 rounds of in silico evolution. Promoter strengths measured in tobacco leaves were compared to predictions from the tobacco model (c) and the data from maize protoplasts was compared to the predictions from the maize model (d). The models used for the in silico evolution are indicated on each plot.
About this article
Cite this article
Jores, T., Tonnies, J., Wrightsman, T. et al. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat. Plants 7, 842–855 (2021). https://doi.org/10.1038/s41477-021-00932-y