Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-β enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).
Lander, E.S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011).
Dorer, D.E. & Nettelbeck, D.M. Targeting cancer by transcriptional control in cancer gene therapy and viral oncolysis. Adv. Drug Deliv. Rev. 61, 554–571 (2009).
Fan, F. & Wood, K.V. Bioluminescent assays for high-throughput screening. Assay Drug Dev. Technol. 5, 127–136 CrossRef (2007).
Loew, R., Heinz, N., Hampf, M., Bujard, H. & Gossen, M. Improved Tet-responsive promoters with minimized background expression. BMC Biotechnol. 10, 81 (2010).
Carey, M., Peterson, C.L. & Smale, S.T. Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques. Edn. 2 (Cold Spring Harbor Laboratory Press, 2009).
LeProust, E.M. et al. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Res. 38, 2522–2540 (2010).
Patwardhan, R.P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).
Kinney, J.B., Murugan, A., Callan, C.G. Jr. & Cox, E.C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl. Acad. Sci. USA 107, 9158–9163 (2010).
Panne, D., Maniatis, T. & Harrison, S.C. An atomic model of the interferon-beta enhanceosome. Cell 129, 1111–1123 (2007).
Arnosti, D.N. & Kulkarni, M.M. Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005).
Jonsson, J., Norberg, T., Carlsson, L., Gustafsson, C. & Wold, S. Quantitative sequence-activity models (QSAM)–tools for sequence design. Nucleic Acids Res. 21, 733–739 (1993).
Stormo, G.D., Schneider, T.D. & Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 14, 6661–6679 (1986).
Mayr, B. & Montminy, M. Transcriptional regulation by the phosphorylation-dependent factor CREB. Nat. Rev. Mol. Cell Biol. 2, 599–609 (2001).
Benbrook, D.M. & Jones, N.C. Different binding specificities and transactivation of variant CRE's by CREB complexes. Nucleic Acids Res. 22, 1463–1469 (1994).
Fink, J.S. et al. The CGTCA sequence motif is essential for biological activity of the vasoactive intestinal peptide gene cAMP-regulated enhancer. Proc. Natl. Acad. Sci. USA 85, 6662–6666 (1988).
Kunsch, C., Ruben, S.M. & Rosen, C.A. Selection of optimal kappa B/Rel DNA-binding motifs: interaction of both subunits of NF-kappa B with DNA is required for transcriptional activation. Mol. Cell. Biol. 12, 4412–4421 (1992).
Falvo, J.V., Parekh, B.S., Lin, C.H., Fraenkel, E. & Maniatis, T. Assembly of a functional beta interferon enhanceosome is dependent on ATF-2-c-jun heterodimer orientation. Mol. Cell. Biol. 20, 4814–4825 (2000).
Schneider, T.D. & Stormo, G.D. Excess information at bacteriophage T7 genomic promoters detected by a random cloning technique. Nucleic Acids Res. 17, 659–674 (1989).
Bishop, C.M. Pattern Recognition and Machine Learning (Springer, 2006).
De Mey, M., Maertens, J., Lequeux, G.J., Soetaert, W.K. & Vandamme, E.J. Construction and model-based analysis of a promoter library for E. coli: an indispensable tool for metabolic engineering. BMC Biotechnol. 7, 34 (2007).
Quan, J. et al. Parallel on-chip gene synthesis and application to optimization of protein expression. Nat. Biotechnol. 29, 449–452 (2011).
Matzas, M. et al. High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing. Nat. Biotechnol. 28, 1291–1294 (2010).
Bernstein, B.E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
Edelman, G.M., Meech, R., Owens, G.C. & Jones, F.S. Synthetic promoter elements obtained by nucleotide sequence variation and selection for activity. Proc. Natl. Acad. Sci. USA 97, 3038–3043 (2000).
Schlabach, M.R., Hu, J.K., Li, M. & Elledge, S.J. Synthetic design of strong promoters. Proc. Natl. Acad. Sci. USA 107, 2538–2543 (2010).
Holland, J.H. Adaptation in Natural and Artificial Systems: AN Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence Edn. 1 (MIT Press, 1992).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J.R. Stat. Soc. B 57, 289–300 (1995).
Treves, A. & Panzeri, S. The upward bias in measures of information derived from limited samples. Neural Comput. 7, 399–407 (1995).
The authors would like to thank E.M. LeProust and S. Chen of Agilent for oligonucleotide library synthesis, R.P. Deering for assistance with Sendai virus infections and the staff of the Broad Institute and the Bauer Core facilities for assistance with data generation. This project was supported by funds from the Broad Institute, the Harvard Stem Cell Institute (T.S.M.), National Human Genome Research Institute grant R01HG004037 (M.K.), the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory (J.B.K.), National Science Foundation (NSF) grant PHY-0957573 (C.G.C., T.T.) and NSF grant PHY-1022140 (A. Mur.).
A patent application describing ideas presented in this article has been filed by the Broad Institute.
Supplementary Tables 5,6, Supplementary Notes and Supplementary Figs. 1–10 (PDF 5994 kb)
CRE variants (XLSX 5145 kb)
IFNB variants (XLSX 3389 kb)
CRE mutagenesis/models (XLSX 39 kb)
IFNB mutagenesis/models (XLSX 36 kb)
About this article
Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
PLOS ONE (2019)
BMC Genomics (2019)
Identification of Functional Variants in the FAM13A Chronic Obstructive Pulmonary Disease Genome-Wide Association Study Locus by Massively Parallel Reporter Assays
American Journal of Respiratory and Critical Care Medicine (2019)
Trends in Neurosciences (2019)
A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity
Nature Biotechnology (2019)